Question: Text categorization is the task of assigning a given document to one of a fixed set of categories, on the basis of the text it

Text categorization is the task of assigning a given document to one of a fixed set of categories, on the basis of the text it contains. Naive Bayes models are often used for this task in these models, the query variable is the document category, and the ‘effect” variables are the presence or absence of each word in the language; the assumption is that words occur independently in documents, with frequencies determined by the document category.

a. Explain precisely how such a model can be constructed, given as “training data” a set of documents that have been assigned to categories.

b. Explain precisely how to categorize a new document.

c. Is the independence assumption reasonable? Discuss.

Step by Step Solution

3.33 Rating (162 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

This question is essentially previewing material in Chapter 23 page 842 but stu dents sho... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Document Format (1 attachment)

Word file Icon

21-C-S-A-I (211).docx

120 KBs Word File

Students Have Also Explored These Related Artificial Intelligence Questions!