Question: Suppose your task is to create a dataset for open - domain question - answering ( ODQA ) . The dataset should consist of tuples
Suppose your task is to create a dataset for opendomain questionanswering ODQA The dataset should consist of tuples of the form where q is a question such as Why is the sky bule? and a is a factually correct answer. For example a true presupposition would address the refraction of sunlight through the atmosphere but a false presupposition would be about light reflecting off the ocean. To avoid extensive manual data creation and control annotation costs, you have identified Reddit as a potential source from which to create this dataset. Reddit contains forums also called subreddits about specific topics such as Science or Explain Like I am Five Posts have a title, which can be a question Why is the sky blue but also nonquestions The color of the sky After the title, posts have further text by the initial author, which may elaborate on the question but also provide their own attempt at an answer. Others can then reply to the original post. Original posts as well as the responses can be upvoted or downvoted.Explain how you might address false presuppositions. "The sky is blue because of the ocean" is a false presupposition because this is NOT factually true and might be present as a statement in the post or amongst comments. Simply filtering out questions with false presuppositions is not an option because it reduces the diversity of questions available to the model in the training data.
Hint: How might you address this when creating q a or how might you convert q a above into improved q a to rectify. Hint: Provide steps. Annotators are available to you, but costly so you want minimize their use, or use them effectively.
Note: The reliance on advanced generative models such as ChatGPT or similar LLMs for automated factchecking is not an option in this problem.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
