Question: Document Categorization Consider the following document-term matrix: Assume that documents have been manually assigned to two pre-specified categories as follows: Cat1 = {Doc1, Doc2, Doc5}

Document Categorization Consider the following document-term matrix:

Assume that documents have been manually assigned to two pre-specified categories as follows:

Cat1 = {Doc1, Doc2, Doc5} Cat2 = {Doc3, Doc4, Doc6, Doc7}

Using the K-Nearest-Neighbor approach for document categorization with K = 3 (see notes onDocument Categorization), determine how each of the new documents given below will be classified. Show your steps. Note: Use non-normalized tf*idf weights and cosine similarity when computing similarities.

T1 T2 T3 T4 T5 T6 T7 T8
Doc8 3 1 0 4 1 0 2 1
Doc9 0 0 3 0 1 5 0 1

Repeat the classification problem above, but this time use the Rocchio-Based vector space model to determine how Doc 8 and Doc 9 given above will be classified. As before, use non-normalized tf*idf weights and cosine similarity when computing similarities.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!