Question: Question 2 : Implement Vector space Ranking - ( 7 + 3 ) marks Dataset: Find a suitable open - source dataset ( A minimum

Question 2: Implement Vector space Ranking -(7+3) marks
Dataset: Find a suitable open-source dataset (A minimum of 25 documents of same category to be used) for computing similarity among documents. Preprocess the Data set. (Built-in libraries can be used for preprocessing only and not for implementing a and b.)
Implement a python function that takes a query as input . For both the query and each document in the dataset compute weighted tf-idf vector. Find the cosine similarity score for the query vector and each document vector by considering Logarithmic term weighting for query and document, idf weighting for both query and document and cosine normalization for both. Rank documents with respect to the query by score and display the top 5, most- similar documents with similarity score. In a similar way compute Query-document match score using Jaccard coefficient for each of the documents in the dataset. Display the top 5, most-similar documents with similarity score. (7)
Discuss the pros and cons of similarity measures using Jaccard and Cosine similarity. (3)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!