Question: Question 2 : Implement Vector space Ranking - ( 7 + 3 ) marks Dataset: Find a suitable open - source dataset ( A minimum

Question

2

: Implement Vector space Ranking

- (7 + 3)

marks

Dataset: Find a suitable open

-

source dataset

(

A minimum of

25

documents of same category to be used

)

for computing similarity among documents. Preprocess the Data set.

(

Built

-

in libraries can be used for preprocessing only and not for implementing a and b

.)

Implement a python function that takes a query as input

.

For both the query and each document in the dataset compute weighted tf

-

idf vector. Find the cosine similarity score for the query vector and each document vector by considering Logarithmic term weighting for query and document, idf weighting for both query and document and cosine normalization for both. Rank documents with respect to the query by score and display the top

5,

most

-

similar documents with similarity score. In a similar way compute Query

-

document match score using Jaccard coefficient for each of the documents in the dataset. Display the top

5,

most

-

similar documents with similarity score.

(7)

Discuss the pros and cons of similarity measures using Jaccard and Cosine similarity.

(3)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

This question involves the use of AGGREGATE linear PYTHOIN regression on the Auto data set. (a) Perform a simple linear regression with mpg as the response and horsepower as the predictor. Describe...

What is the difference between MouseListener and MouseAdapter? [3 marks] (b) Via suitable HTML, the compiled version of the following Java code is presented to the appletviewer application: import...

Visit http://www.swisswireless.org/wlan_calc_en.html and go to the Antenna section. Observe the dynamics between the signal frequency, antenna diameter, and antenna gain of a parabolic antenna. 1....

can someone solve this Modern workstations typically have memory systems that incorporate two or three levels of caching. Explain why they are designed like this. [4 marks] In order to investigate...

ANSI-SPARC6 Programming Language Compilation Write notes on each of the following topics: (a) the implementation of labels and jumps in a recursive, block structured programming language [7 marks]...

MATHEMATICS FOR MACHINE LEARNING Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong Contents Foreword 1 Part I Mathematical Foundations 9 1 Introduction and Motivation 11 1.1 Finding Words for...

The new line character is utilized solely as the last person in each message. On association with the server, a client can possibly (I) question the situation with a client by sending the client's...

Developments in Technology Light is incident from air on the end face of a multimode optical fibre at angle of incidence as shown below. n n 1 2 The refractive indices of the core and cladding are...

GRADUATE CERTIFICATE IN PROJECT MANAGEMENT PROJ5010: PROJECT PROCUREMENT AND STRATEGIC SOURCING. CASE STUDIES CONTENTS 1. Proj5010: The World Bank RFP Case Study covers 1. Assignment 1: Marks = 5 2....

Appendix G: (Not required information but I put it here incase it might help) b. Build a Simulink program based on the transfer function in Eq. (2-8) with R= 10k12 and C = 10uF. to conduct...

What new concept did Maxwells generalized form of Amperes law include?

Suppose that X has a discrete uniform distribution A random sample of n = 36 is selected from this population. Find the probability that the sample mean is greater than 2.1 but less than 2.5,...

The beta of a risk - free asset is: 1 . 0 . 0 . 5 . - 1 . 0 . 0 . 0 .

Choose the more appropriate word to replace the sexist language in this sentence: After the storm, the mayor declared that the local park would require all of its manpower to repair the damage. labor