Question: Document Categorization Consider the following document-term matrix: Assume that documents have been manually assigned to two pre-specified categories as follows: Cat1 = {Doc1, Doc2, Doc5}
Document Categorization Consider the following document-term matrix:
Assume that documents have been manually assigned to two pre-specified categories as follows:
Cat1 = {Doc1, Doc2, Doc5} Cat2 = {Doc3, Doc4, Doc6, Doc7}
Using the K-Nearest-Neighbor approach for document categorization with K = 3 (see notes onDocument Categorization), determine how each of the new documents given below will be classified. Show your steps. Note: Use non-normalized tf*idf weights and cosine similarity when computing similarities.
| T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | |
| Doc8 | 3 | 1 | 0 | 4 | 1 | 0 | 2 | 1 |
| Doc9 | 0 | 0 | 3 | 0 | 1 | 5 | 0 | 1 |
Repeat the classification problem above, but this time use the Rocchio-Based vector space model to determine how Doc 8 and Doc 9 given above will be classified. As before, use non-normalized tf*idf weights and cosine similarity when computing similarities.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
