Question: 1. Consider the following two simple documents. [50 points] (A) precision is very very high (B) high precision is very very very important Assume the

1. Consider the following two simple documents. [50 points] (A) precision

1. Consider the following two simple documents. [50 points] (A) precision is very very high (B) high precision is very very very important Assume the only stopwords are: is", am" and "are" in our system. 1) For each document, write down the normalized vector of term frequency of each term Compute the cosine using the format term: value, term: value, term: value, similarity of the two documents 2) Consider the tf-idf weighting. For each document, write down the normalized vector of 3) Consider the query "precision is high". Transfer the query into vector space by using 4) Now consider a vector space where each dimension is a word bigram instead of single tf-idf weights. Compute the cosine similarity of the two documents TF.IDF, then rank document A and B for given query using cosine similarity term. For each document, write down the normalized vector of bigram frequency {bigram: value, bigram: value, } . Compute the cosine similarity between documents A and B 5) For each document, write down the normalized vector of tf-idf weights where each dimension is a word bigram. Compute the cosine similarity between documents A and B

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Give Correct ANSWERS Human-Computer Interaction (a) If you had been one of the original inventors of the WIMP interface, and engineers on the technical team had been sceptical about the advantages...

Briefly discuss the cost-volume-profit analysis model and how it is used. Use the attached PDF. APA Citation. 300 words. REVISED PAGES 3 Chapter Three 1 Fundamentals of Cost-Volume-Prot Analysis...

Save and Submit 1. QUESTION 1 Which of the following items must be included in a Project Charter? A. Names of all of the team members who will work on the project. B. The impact of business cycles on...

A certain program has to maintain an array, count, of N counters which are all initialised to zero. The value of counter i can be incremented by one by the call: increment(i), and this is the only...

Chapter Six Union Organizing Campaigns This chapter is the first of two chapters that examine how unions organize new bargaining units and how and why employers attempt to avoid being unionized....

In statistics, estimation refers to the process by which one makes inferences about a population or model, based on information obtained from a sample. In practice, it is often impossible to examine...

Your name Homework # Due date Your seat number RMI 2101 Spring 2016 Homework Assignment 5 20 points Due on Tuesday, March 15, 2016 AT THE BEGINNING OF YOUR CLASS Be careful to follow the \"Guidelines...

uantitative Analysis BA 452 Homework 3 Questions Homework 3 covers the theory and applications in Lessons I-6 and I-7. This document has four parts: Objectives of doing your homework. Assignment of...

1456HHSC attend all (a) In the quantum teleportation protocol, Alice and Bob are every in possession of one qubit of a couple in the joint country00i + statei. Explain how the protocol works. In...

How many points do teams score in the championship game of a certain sport? The total numbers of points scored by both teams in each of the first 46 championship games are shown below. Complete parts...

Know what I think of the work the last systems analyst team did? The printouts created are a jungle. To figure out the cost of raw materials to us, I have to cut my way through the overgrowth of...

The least - squares regression method and high - low method produced significantly different results. Group of answer choices True False

Compared with half a century ago, adoption has become _ _ _ _ _ _ _ _ _ common, but it is more open and acceptabl e , so we probably discuss it _ _ _ _ _ _ _ . fill in the blanks more or much less or...

b. What are its goals and objectives?

3. Messages of affect and value will decrease. Ambiguity in interpreting information will increase. Managers will have to seek new ways of communicating the affective component of messages.

6. Expectations of work performance may be machine driven. Employees in some organizations will perceive this as dehumanizing and coercive. (pp. 480481)