There are two Classes, C1 and C2. The total number of documents in the training set is
Fantastic news! We've Found the answer you've been seeking!
Question:
There are two Classes, C1 and C2. The total number of documents in the training set is 50, and the number of documents belonging to C1 is 30. The following table shows the probability of P(Xi | Cj). Given a document D1 that contains some terms shown in the table.
- (6%) Please use the Naive Bayesian document classification method to determine which Class does D1 belong to.
- (3%) Briefly explain the main idea of Probabilistic Model (Multinomial model) for document classification.
- (4%) Briefly explain the Vector Space Model (VSM). Explain tfij, logNdfj and the formula wij = tfij × logNdfj (for term tj of the document Di)
Table | Document D1 | ||||
Term(Xi) | P(Xi|C1) | P(Xi|C2) | Document term | Frequency | |
X1 | 3/16 | 1/16 | X1 | 0 | |
X2 | 1/16 | 1/8 | X2 | 2 | |
X3 | 1/4 | 1/4 | X3 | 2 | |
X4 | 5/32 | 3/16 | X4 | 0 | |
X5 | 1/32 | 1/8 | X5 | 1 | |
X6 | 1/16 | 1/16 | X6 | 2 | |
X7 | 1/8 | 1/16 | X7 | 1 | |
X8 | 1/8 | 1/8 | X8 | 0 |
Posted Date: