Question: There are two Classes, C1 and C2. The total number of documents in the training set is 50, and the number of documents belonging to

There are two Classes, C1 and C2. The total number of documents in the training set is 50, and the number of documents belonging to C1 is 30. The following table shows the probability of P(Xi | Cj). Given a document D1 that contains some terms shown in the table.

  1. (6%) Please use the Naive Bayesian document classification method to determine which Class does D1 belong to.
  2. (3%) Briefly explain the main idea of Probabilistic Model (Multinomial model) for document classification.
  3. (4%) Briefly explain the Vector Space Model (VSM). Explain tfij, logNdfj and the formula wij = tfij ×  logNdfj (for term tj of the document Di)

Table

Document D1

Term(Xi)

P(Xi|C1)

P(Xi|C2)

Document term

Frequency

X1

3/16

1/16

X1

0

X2

1/16

1/8

X2

2

X3

1/4

1/4

X3

2

X4

5/32

3/16

X4

0

X5

1/32

1/8

X5

1

X6

1/16

1/16

X6

2

X7

1/8

1/16

X7

1

X8

1/8

1/8

X8

0


Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

6 Please use the Naive Bayesian document classification method to determine which Class does D1 belong to PC1D1 316 116 14 532 132 116 18 18 316 116 1... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!