# Question: Text categorization is the task of assigning a given document

Text categorization is the task of assigning a given document to one of a fixed set of categories, on the basis of the text it contains. Naive Bayes models are often used for this task in these models, the query variable is the document category, and the ‘effect” variables are the presence or absence of each word in the language; the assumption is that words occur independently in documents, with frequencies determined by the document category.

a. Explain precisely how such a model can be constructed, given as “training data” a set of documents that have been assigned to categories.

b. Explain precisely how to categorize a new document.

c. Is the independence assumption reasonable? Discuss.

a. Explain precisely how such a model can be constructed, given as “training data” a set of documents that have been assigned to categories.

b. Explain precisely how to categorize a new document.

c. Is the independence assumption reasonable? Discuss.

## Answer to relevant Questions

In our analysis of the wumpus world, we used the fact that each square contains a pit with probability 0.2, independently of the contents of the other squares. Suppose instead that exactly N/5 pits are scattered uniformly at ...The probit distribution defined, describes the probability distribution for a Boolean child, given a single continuous parent.a. How might the definition be extended to cover multiple continuous parents?b. How might it be ...Show that any second-order Markov process can be rewritten as a first-order Markov process with an augmented set of state variables. Can this always he done parsimoniously that is, without increasing the number of parameters ...In this exercise, we analyze in more detail the persistent-failure model for the battery sensor in Figure (a). a. Figure (b) stops at t = 32. Describe qualitatively what should happen as t → ∞ if the sensor ...This exercise completes the analysis of the airport-sitting problem in Figure.a. Provide reasonable variable domains, probabilities, and utilities for the network, assuming that there are three possible sites.b. Solve the ...Post your question