Question: Information Retrieval Due Date: Mon, Oct. 9, 2023 Turn-in: Write your answers in a separate document file (e.g., docx, pdf, xlsx, md, etc.) and
Information Retrieval Due Date: Mon, Oct. 9, 2023 Turn-in: Write your answers in a separate document file (e.g., docx, pdf, xlsx, md, etc.) and turn in via the D2L Assignments. Write your name and course number at the beginning of your document. Problem Set IV 1 Comparing Retrieval Models Problem 1.1. Suppose we have the query "oil producing nations", and the three query terms have inverted lists as follows: Ik (dfk, ctfk, (doc, tfik), ...) oil (5, 18, (1, 4), (4, 3), (6, 1), (7, 2), (8, 8)) producing (4, 20, (1, 6), (2, 2), (5, 4), (8, 8)) nations (3, 11, (1, 1), (3, 2), (8, 8)) Further suppose we have a collection of documents with lengths as follows: d 498 d6 639 d2 627 d7 566 d8 d3 d4 648 d5 621 571 423 dg 589 d0 525 and the total number of terms in the corpus is 5687. What are the scores of the 10 documents using each of the following retrieval models and what are their different ranks: tfik log - i (a) Vector space model with term weighting: Wik = len 1 N+1 0.5+dfk log. (b) Binary independence model: if tfik > O then BIM term = (c) BM25 model with parameters k = 1.2, b = 0.75 (d) Language model with Jelinek-Mercer smoothing parameter = 0.2 (e) Language model with Dirichlet smoothing parameter = 2000 N dfk else term = 0 Explain what makes these models different from each other, focusing especially on how they use term frequency, document frequency, and document length to calculate document scores and ranks.
Step by Step Solution
3.38 Rating (154 Votes )
There are 3 Steps involved in it
a Vector Space Model VSM The VSM calculates the score for a documentquery pair using the cosine similarity between the document vector and the query vector The formula for the score is Scored q wiktfi... View full answer
Get step-by-step solutions from verified subject matter experts
