Question: Given retrieval example below, repeat the calculation for the following 2 documents ( leave query info the same) d1 ( already done) car insurance auto

Given retrieval example below,

Given retrieval example below, repeat the calculation for the following 2 documents

repeat the calculation for the following 2 documents ( leave query info the same)

d1 ( already done) car insurance auto insurance

d2 ( new doc) car auto insurance auto

d3 ( new doc) car car auto insurance car

Compare the scores between the three documents, then normalize the results

2) Repeat for all three docs for the Jacard coefficient, then normalize the results

3) This question will illustrate the different between using euclidean

distances between the query and the documents versus using

vector dot product between the query and the documents

Suppose we have the following documents and query

query vocabulary d1 d2 d3

3 car 1 200 3000

1 auto 0 0 0

2 insuranc e 1 200 3000

a) Find the similarity scores using 1 + log(tf) only

using unit vector dot products between the documents and the query

b) Find a score for all three documents based upon the distance

between the query and each of the documents ( and use 1 +log(tf))

Use the equation distance = sqrt( ( qt1 - dt1)**2 + (qt2-dt2)**2 + (qt3-dt3)**2))

and use the distances as the score for all three documents (** means squared) (t is the term entry.)

Introduction to Information Retrieval Sec. 6.4 tf-idf example: Inc. Ito Document: car insurance auto insurance Query: best car insurance Term Query Document Pro d auto best t-tf-wt dfidf wt n'liz tf-raw tf-wt wt n'liz raw 0 0 5000 2.3 0 0 1 1 1 0.52 1 1 50000 1.3 1.3 0.34 0 0 0 0 1 1 10000 2.0 2.0 0.52 1 1 1 0.52 1 1 1000 3.0 3.0 0.78 2 1.3 1.3 0.68 Exercise: what is N, the number of docs? Doc length = 1 +0 +1 +1.32 -1.92 Score = 0+0+0.27+0.53 = 0.8 0 0 0.27 0.53 car insurance Introduction to Information Retrieval Sec. 6.4 tf-idf example: Inc. Ito Document: car insurance auto insurance Query: best car insurance Term Query Document Pro d auto best t-tf-wt dfidf wt n'liz tf-raw tf-wt wt n'liz raw 0 0 5000 2.3 0 0 1 1 1 0.52 1 1 50000 1.3 1.3 0.34 0 0 0 0 1 1 10000 2.0 2.0 0.52 1 1 1 0.52 1 1 1000 3.0 3.0 0.78 2 1.3 1.3 0.68 Exercise: what is N, the number of docs? Doc length = 1 +0 +1 +1.32 -1.92 Score = 0+0+0.27+0.53 = 0.8 0 0 0.27 0.53 car insurance

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!