Information Retrieval Due Date: Mon, Oct. 9, 2023 Turn-in: Write your answers in a separate document...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Information Retrieval Due Date: Mon, Oct. 9, 2023 Turn-in: Write your answers in a separate document file (e.g., docx, pdf, xlsx, md, etc.) and turn in via the D2L Assignments. Write your name and course number at the beginning of your document. Problem Set IV 1 Comparing Retrieval Models Problem 1.1. Suppose we have the query "oil producing nations", and the three query terms have inverted lists as follows: Ik→ (dfk, ctfk, (docį, tfik), ...) oil → (5, 18, (1, 4), (4, 3), (6, 1), (7, 2), (8, 8)) producing → (4, 20, (1, 6), (2, 2), (5, 4), (8, 8)) nations → (3, 11, (1, 1), (3, 2), (8, 8)) Further suppose we have a collection of documents with lengths as follows: d₁ → 498 d6 639 d2 → 627 d7 → 566 d8 d3 d4 → 648 d5→ 621 → 571 → 423 dg → 589 d₁0 → 525 and the total number of terms in the corpus is 5687. What are the scores of the 10 documents using each of the following retrieval models and what are their different ranks: tfik log - i (a) Vector space model with term weighting: Wik = len į 1 N+1 0.5+dfk log. (b) Binary independence model: if tfik > O then BIM term = (c) BM25 model with parameters k = 1.2, b = 0.75 (d) Language model with Jelinek-Mercer smoothing parameter λ = 0.2 (e) Language model with Dirichlet smoothing parameter µ = 2000 N dfk else term = 0 Explain what makes these models different from each other, focusing especially on how they use term frequency, document frequency, and document length to calculate document scores and ranks. Information Retrieval Due Date: Mon, Oct. 9, 2023 Turn-in: Write your answers in a separate document file (e.g., docx, pdf, xlsx, md, etc.) and turn in via the D2L Assignments. Write your name and course number at the beginning of your document. Problem Set IV 1 Comparing Retrieval Models Problem 1.1. Suppose we have the query "oil producing nations", and the three query terms have inverted lists as follows: Ik→ (dfk, ctfk, (docį, tfik), ...) oil → (5, 18, (1, 4), (4, 3), (6, 1), (7, 2), (8, 8)) producing → (4, 20, (1, 6), (2, 2), (5, 4), (8, 8)) nations → (3, 11, (1, 1), (3, 2), (8, 8)) Further suppose we have a collection of documents with lengths as follows: d₁ → 498 d6 639 d2 → 627 d7 → 566 d8 d3 d4 → 648 d5→ 621 → 571 → 423 dg → 589 d₁0 → 525 and the total number of terms in the corpus is 5687. What are the scores of the 10 documents using each of the following retrieval models and what are their different ranks: tfik log - i (a) Vector space model with term weighting: Wik = len į 1 N+1 0.5+dfk log. (b) Binary independence model: if tfik > O then BIM term = (c) BM25 model with parameters k = 1.2, b = 0.75 (d) Language model with Jelinek-Mercer smoothing parameter λ = 0.2 (e) Language model with Dirichlet smoothing parameter µ = 2000 N dfk else term = 0 Explain what makes these models different from each other, focusing especially on how they use term frequency, document frequency, and document length to calculate document scores and ranks.
Expert Answer:
Answer rating: 100% (QA)
a Vector Space Model VSM The VSM calculates the score for a documentquery pair using the cosine similarity between the document vector and the query vector The formula for the score is Scored q wiktfi... View the full answer
Related Book For
Posted Date:
Students also viewed these programming questions
-
The following additional information is available for the Dr. Ivan and Irene Incisor family from Chapters 1-5. Ivan's grandfather died and left a portfolio of municipal bonds. In 2012, they pay Ivan...
-
A Howe scissors roof truss is loaded as shown. Determine the force in members DF, DG, and EG. 2AN KN 2kN F! 1AN L5 m
-
A gene is composed of two alleles, either dominant or recessive. Suppose that a husband and wife, who are both carriers of the sickle-cell anemia allele but do not have the disease, decide to have a...
-
Find zw and z/w Leave your answers in polar form. z = 3(cos 130 + i sin 130) w = 4(cos 270 + i sin 270)
-
Discuss the themes, theory, and/or phenomenon that would be anticipated to emerge as a result of the examination. Develop a hypothetical research scenario that would necessitate the use of the Action...
-
1. What is the total annual cost of the present ordering policy for part number 650810/ss/R9/o? 2. What would be the lot size for part number 650810/ss/R9/o if FabQual were to use an economic order...
-
Short-Term (Operating) financial planning begins with the forecast. Based on this forecast, operating expenses can be estimated and then pro forma income statement and cash budget can be prepared....
-
What is meant by a direct and inverse relationship between economic variables? Illustrate, how each of the following relationships would appear, and indicate whether each relationships is direct or...
-
1. Discuss the importance the following Process decision: -Facility location -Process selection and facility lay out -Design of work system -Capacity planning -Work study -Design and work performance
-
review how to perform some of the most fundamental tasks when dealing with arrays. Some code at the bootom. (Take a look) Program that implements the following menu to interact with the user and...
-
A concrete mix has a ratio of 1 part cement, 2 parts water and 3 parts sand. Hew much concrete can be made if 78 pounds of sand and 16 pounds of cement are available?
-
At the end of 2021, Boise Lenders had a balance in its Allowance for Uncollectible Accounts of $5,000 (debit) before any adjustment. The company estimated its future uncollectible accounts to be...
-
How are accounting errors discovered during the reconciliation process corrected? (1.5 Points) A. After the fact Justification and Approval B. DFAS closed the contract in a memo to the contracting...
-
Compute the future value in year 8 of a $3200 deposit in 1 year and another $2700 deposit at the end of year 3 using a 10 percent interest rate.
-
Find an equation of the given line. Slope is -2; x-intercept is -2
-
The tax system in Taxilvania includes a negative income tax. For all incomes below $10,000, individuals pay an income tax of 40% (that is, they receive a payment of 40% of their income). For any...
-
Use the concept of marginal utility to explain the following: Newspaper vending machines are designed so that once you have paid for one paper, you could take more than one paper at a time. But soda...
-
The U.S. Census Bureau keeps statistics on U.S. imports and exports on its website. The following steps will take you to the foreign trade statistics. Use them to answer the questions below. (i) Go...
-
The truss shown in figure 1.30 has two elements. The members are made of the aluminum hollow square cross section. The outer dimension of the square is \(12 \mathrm{~mm}\), and the inner dimension is...
-
Solve the onedimensional heat conduction problem 6 using the RayleighRitz method. For the heat conduction problem, the total potential can be defined as Use the approximate solution...
-
Consider a system of rigid bodies connected by springs as shown in figure 2.20. The bodies are assumed to move only in the horizontal direction. Further, we consider only the static problem, and...
Study smarter with the SolutionInn App