Question: 1) Consider the following documents D and query Q for the following: D1: you say goodbye D2: hello goodbye hello goodbye hello D3: I say
1) Consider the following documents D and query Q for the following:
D1: you say goodbye
D2: hello goodbye hello goodbye hello
D3: I say hello
Q1: I hello a.
a.Construct the vector space term-document matrix for the above documents using tf.idf term weighting.
b.Compute the similarity between Q and the above documents using tf.idf weight and the three (3) similarity measures:
- Inner product
- Cosine
- Jaccard
Determine their relative ranking.

APPENDIX A tf.idf wij=tfijlog(dfiD) where; ttij= number of term i in document j D= number of document in a database dfi= number of documents in a database containing term i Inner (dot) product similarity measure sim(Di,Q)=k=1t(dikqk) where; Di= document i Q= query dik= the weight of term k in document i qk= the weight of term k in the query Cosine similarity measure Cos(Di,Q)=k=1tdik2k=1tqk2k=1t(dikqk) where; Di= document i Q= query dik= the weight of term k in document i qk= the weight of term k in the query Jaccard similarity measure Jaccard(Di,Q)=k=1tdik2+k=1tqk2k=1t(dikqk)k=1t(dikqk) where; Di= document i Q= query dik= the weight of term k in document i qk= the weight of term k in the query
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
