Question: 1 Text Representation [20 pts] The term-document matrix for four words in three documents are shown in Table 1. The whole document set has N

1 Text Representation [20 pts] The term-document matrix for four words in three documents are shown in Table 1. The whole document set has N = 20 documents, and for each of the four words, the document frequency (number of documents in the set that contains the word) is shown in Table 2 . Doc 1 Doc 2 Doc 3 I 28 5 25 Like 4 19 0 Machine 0 34 30 Learning 15 0 18 Table 1: Term-document matrix Term Document Frequency I 13 Like 7 Machine 11 Learning 17 Table 2: Document frequency of terms (a) [10 pts] Compute the tf-idf values for each of the four words I, Like, Machine, Learning in the three documents. (b) [10 pts] Cosine similarity measures the similarity between two vectors by measuring the cosine of angle between them. The closer the cosine value to 1 , the better the match between vectors. Cosine similarity has the following formula: 1

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!