Question: Information Retrieval Systems Problem: Given the following retrieval example, repeat the calculation for the following 2 documents ( leave query info the same) d1 (

Information Retrieval Systems Problem:

Given the following retrieval example,

Information Retrieval Systems Problem: Given the following retrieval example, repeat the calculation

repeat the calculation for the following 2 documents ( leave query info the same)

d1 ( already done) car insurance auto insurance d2 ( new doc) car auto insurance auto d3 ( new doc) car car auto insurance car

Compare the scores between the three documents, then normalize the results

2) Repeat for all three docs for the Jacard coefficient, then normalize the results

3) This question will illustrate the different between using euclidean distances between the query and the documents versus using vector dot product between the query and the documents

Suppose we have the following documents and query

query vocabulary d1 d2 d3

3 car 1 200 3000 1 auto 0 0 0 2 insuranc e 1 200 3000

a) Find the similarity scores using 1 + log(tf) only using unit vector dot products between the documents and the query

b) Find a score for all three documents based upon the distance between the query and each of the documents ( and use 1 +log(tf))

Use the equation distance = sqrt( ( qt1 - dt1)**2 + (qt2-dt2)**2 + (qt3-dt3)**2))

and use the distances as the score for all three documents (** means squared) (t is the term entry.)

tf-idf example: Inc.Itc Document: car insurance auto insurance Query: best car insurance Term Query Document Pro d e auto best car insurance tf- tf-wt df idf wt n'liz tf-raw tf-wt wt n'liz raw e 0 0 5000 2.3 0 0 1 1 1 0.52 0 1 1 50000 1.3 1.3 0.34 0 0 0 0 0 1 1 10000 2.0 2.0 0.52 1 1 1 0.52 0.27 1 1 1000 3.0 3.0 0.78 2 1.3 1.3 0.68 0.53 Exercise: what is N, the number of docs? Doc length = 1 + 02 +1 +1.32 -1.92 Score = 0+0+0.27+0.53 = 0.8 tf-idf example: Inc.Itc Document: car insurance auto insurance Query: best car insurance Term Query Document Pro d e auto best car insurance tf- tf-wt df idf wt n'liz tf-raw tf-wt wt n'liz raw e 0 0 5000 2.3 0 0 1 1 1 0.52 0 1 1 50000 1.3 1.3 0.34 0 0 0 0 0 1 1 10000 2.0 2.0 0.52 1 1 1 0.52 0.27 1 1 1000 3.0 3.0 0.78 2 1.3 1.3 0.68 0.53 Exercise: what is N, the number of docs? Doc length = 1 + 02 +1 +1.32 -1.92 Score = 0+0+0.27+0.53 = 0.8

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!