Question: Suppose your task is to create a dataset for open - domain question - answering ( ODQA ) . The dataset should consist of tuples

Suppose your task is to create a dataset for open-domain question-answering (ODQA). The dataset should consist of tuples of the form where q is a question such as Why is the sky bule? and a is a factually correct answer. For example a true presupposition would address the refraction of sunlight through the atmosphere but a false presupposition would be about light reflecting off the ocean. To avoid extensive manual data creation and control annotation costs, you have identified Reddit as a potential source from which to create this dataset. Reddit contains forums (also called sub-reddits) about specific topics such as Science or Explain Like I am Five. Posts have a title, which can be a question (Why is the sky blue) but also non-questions (The color of the sky). After the title, posts have further text by the initial author, which may elaborate on the question but also provide their own attempt at an answer. Others can then reply to the original post. Original posts as well as the responses can be up-voted or down-voted.Explain how you might address false presuppositions. "The sky is blue because of the ocean" is a false presupposition because this is NOT factually true and might be present as a statement in the post or amongst comments. Simply filtering out questions with false presuppositions is not an option because it reduces the diversity of questions available to the model in the training data.
Hint: How might you address this when creating (q, a) or how might you convert (q, a) above into improved (q', a') to rectify. Hint: Provide steps. Annotators are available to you, but costly so you want minimize their use, or use them effectively.
Note: The reliance on advanced generative models such as ChatGPT or similar LLMs for automated fact-checking is not an option in this problem.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!