Question: Oftentimes, the documents we want to search have some amount of structure. Scholarly articles, for example, usually have a title, a set of authors, an
Oftentimes, the documents we want to search have some amount of structure. Scholarly articles, for example, usually have a title, a set of authors, an abstract a main body, a references section, and possibly an appendix. It turns out that weighting some parts of a document eg the title more heavily than other parts eg the appendix improves retrieval performance. The general idea is that a document with many of the queryterms appearing in the title should be scored and rank higher than a document with many of the queryterms appearing in the appendixthe title describes the main content of the document better than the appendix.
Suppose you have a collection of documents with two nonoverlapping fields: a TITLE field and a BODY field. And, suppose you have access to an outofthebox search engine that performs vector space model retrieval using a binary text representation s and s and innerproduct scoring. Your goal is to design a solution that weights the TITLE field more than the BODY field. In other words, if you have a query with a single query term egjack you want a document that has jack in the title and nowhere else to be scored and ranked higher than a document that has jack in the body and nowhere else
How would you do this? HINT: there are many right answers. Be creative and have fun!
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
