Question: Question 2 : Implement Vector space Ranking - ( 7 + 3 ) marks Dataset: Find a suitable open - source dataset ( A minimum
Question : Implement Vector space Ranking marks
Dataset: Find a suitable opensource dataset A minimum of documents of same category to be used for computing similarity among documents. Preprocess the Data set. Builtin libraries can be used for preprocessing only and not for implementing a and b
Implement a python function that takes a query as input For both the query and each document in the dataset compute weighted tfidf vector. Find the cosine similarity score for the query vector and each document vector by considering Logarithmic term weighting for query and document, idf weighting for both query and document and cosine normalization for both. Rank documents with respect to the query by score and display the top most similar documents with similarity score. In a similar way compute Querydocument match score using Jaccard coefficient for each of the documents in the dataset. Display the top mostsimilar documents with similarity score.
Discuss the pros and cons of similarity measures using Jaccard and Cosine similarity.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
