Question: Information Retrieval Due Date: Mon, Oct. 9, 2023 Turn-in: Write your answers in a separate document file (e.g., docx, pdf, xlsx, md, etc.) and

Information Retrieval Due Date: Mon, Oct. 9, 2023 Turn-in: Write your answers in a separate document file (e.g., docx, pdf, xlsx, md, etc.) and turn in via the D2L Assignments. Write your name and course number at the beginning of your document. Problem Set IV 1 Comparing Retrieval Models Problem 1.1. Suppose we have the query "oil producing nations", and the three query terms have inverted lists as follows: Ik (dfk, ctfk, (doc, tfik), ...) oil (5, 18, (1, 4), (4, 3), (6, 1), (7, 2), (8, 8)) producing (4, 20, (1, 6), (2, 2), (5, 4), (8, 8)) nations (3, 11, (1, 1), (3, 2), (8, 8)) Further suppose we have a collection of documents with lengths as follows: d 498 d6 639 d2 627 d7 566 d8 d3 d4 648 d5 621 571 423 dg 589 d0 525 and the total number of terms in the corpus is 5687. What are the scores of the 10 documents using each of the following retrieval models and what are their different ranks: tfik log - i (a) Vector space model with term weighting: Wik = len 1 N+1 0.5+dfk log. (b) Binary independence model: if tfik > O then BIM term = (c) BM25 model with parameters k = 1.2, b = 0.75 (d) Language model with Jelinek-Mercer smoothing parameter = 0.2 (e) Language model with Dirichlet smoothing parameter = 2000 N dfk else term = 0 Explain what makes these models different from each other, focusing especially on how they use term frequency, document frequency, and document length to calculate document scores and ranks.

Step by Step Solution

★★★★★

3.38 Rating (154 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock

a Vector Space Model VSM The VSM calculates the score for a documentquery pair using the cosine similarity between the document vector and the query vector The formula for the score is Scored q wiktfi... View full answer

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Discuss the division of responsibilities of Service Canada for payroll function?

BeGone manufactures spray cans of insect repellent. On August 1, the company had 13,720 units in the beginning WIP Inventory that were 100 percent complete as to canisters, 60 percent complete as to...

The following additional information is available for the Dr. Ivan and Irene Incisor family from Chapters 1-5. Ivan's grandfather died and left a portfolio of municipal bonds. In 2012, they pay Ivan...

Integrity Company's Job 206 for the manufacture of 2,200 units was completed during August at the unit costs presented as follows: Direct materials: P20 Direct labor: 18 Factory overhead (includes an...

Description The assignment requires that you analyse a data set, interpret, and draw conclusions from your analysis, and then convey your conclusions in a written report. The assignment must be...

Needing ANSWERS ASAP! Starting at pg 34 - Labeled Graded Project 06155200: Graded Project Instructions & Worksheets 1 Lesson 1: Business, Accounting, and You PROJECT GOAL The goal of this graded...

ACC353 ? Fall 2015 Excel Homework ? Linear Regression (30 points) Use the spreadsheet data provided for US Airways. The Excel file is posted on Moodle in the Regression Assignment folder. Select one...

this assignment is regarding return the tax of a client by using handy taxassignment. can anyone help me to complete the income section of this assignment, just write the solution in a pdf file?I...

Processing steps for 18 questions are required. Thanks so much for help! Queensland University of Technology QUT Business School School of Accountancy AYB 219 Taxation Law HandiTax Group Project...

Module 9 Assignment: TOC Answer all the questions and submit your answer report to Module 9 Assignment in Dropbox by the deadline . The report should be typed, single spaced, in one MS Word file. You...

The spur gear is subjected to the two forces caused by contact with other gears. Express each force as a Cartesianvector. 60 = 180 lb F 60 135 24- = 50 lb F

1. What is the annual cost of serving the entire nation from Chicago? 2. Do you recommend adding any plant(s)? If so, where should the plant(s) be built and what lines should be included? Assume that...

An employee works 4 0 hours per week. The employee's regular hourly rate is $ 2 9 . Assume that all of the earnings are subject to social security tax at a rate of 6 . 2 percent and Medicare tax at a...

Consider the following financial statement information for the Pawlonia Corporation: Assume all sales are on credit. Calculate the operating and cash cycles. How do you interpret your answer?...

The Litzenberger Company has projected the following quarterly sales amounts for the coming year: a. Accounts receivable at the beginning of the year are $310. Litzenberger has a 45-day collection...

The following information was taken from the balance sheet of Laribee Company (amounts are in thousands of dollars): Current liabilities* ............................ $ 24,480 Long-term debt...

The March bank statement showed the following for Yap Co: Additional information: 1. The bank statement contained three debit memoranda: ¢ An NSF cheque of $595 that Yap had deposited was...

On April 1, 2008, the Texidor Company issued bonds with a face value of $250,000 for $260,000 cash. These bonds paid an annual interest of 8 percent. The interest was paid semiannually on April 1 and...

1.Highlight and discuss a minimum of three areas which have enabled Zara to excel at driving traffic to stores. Substantiate your answers with at least one tangible example in each case from the real...

The tax system in Taxilvania includes a negative income tax. For all incomes below $10,000, individuals pay an income tax of 40% (that is, they receive a payment of 40% of their income). For any...

Use the concept of marginal utility to explain the following: Newspaper vending machines are designed so that once you have paid for one paper, you could take more than one paper at a time. But soda...

The U.S. Census Bureau keeps statistics on U.S. imports and exports on its website. The following steps will take you to the foreign trade statistics. Use them to answer the questions below. (i) Go...

A coin toss is used to decide which team gets the ball first in most sports. It involves little effort and is believed to give each side the same chance. In 47 Super Bowl games, the National Football...

An urban planner claims that, nationally, 20% of all families renting condominiums move during a given year. A random sample of 200 families renting condominiums in the Dallas Metroplex revealed that...

A sample of employees at a large chemical plant was asked to indicate a preference for one of three pension plans. The results are given in the following table. Does it seem that there is a...