Question: There are two Classes, C1 and C2. The total number of documents in the training set is 50, and the number of documents belonging to
There are two Classes, C1 and C2. The total number of documents in the training set is 50, and the number of documents belonging to C1 is 30. The following table shows the probability of P(Xi | Cj). Given a document D1 that contains some terms shown in the table.
- (6%) Please use the Naive Bayesian document classification method to determine which Class does D1 belong to.
- (3%) Briefly explain the main idea of Probabilistic Model (Multinomial model) for document classification.
- (4%) Briefly explain the Vector Space Model (VSM). Explain tfij, logNdfj and the formula wij = tfij × logNdfj (for term tj of the document Di)
Table | Document D1 | ||||
Term(Xi) | P(Xi|C1) | P(Xi|C2) | Document term | Frequency | |
X1 | 3/16 | 1/16 | X1 | 0 | |
X2 | 1/16 | 1/8 | X2 | 2 | |
X3 | 1/4 | 1/4 | X3 | 2 | |
X4 | 5/32 | 3/16 | X4 | 0 | |
X5 | 1/32 | 1/8 | X5 | 1 | |
X6 | 1/16 | 1/16 | X6 | 2 | |
X7 | 1/8 | 1/16 | X7 | 1 | |
X8 | 1/8 | 1/8 | X8 | 0 | |
Step by Step Solution
There are 3 Steps involved in it
6 Please use the Naive Bayesian document classification method to determine which Class does D1 belong to PC1D1 316 116 14 532 132 116 18 18 316 116 1... View full answer
Get step-by-step solutions from verified subject matter experts
