Question: In this question you will implement a Naive Bayes classifier for a text classification problem. You will be given a collection of text articles,

X and the corresponding class labels Y; second, we assume for each pair of features X, and X, with i j that

In this question you will implement a Naive Bayes classifier for a text classification problem. You will be given a collection of text articles, each coming from either the serious European magazine The Economist, or from the not-so-serious American magazine The Onion. The goal is to learn a classifier that can distinguish between articles from each magazine. We have pre-processed the articles so that they are easier to use in your experiments. We extracted the set of all words that occur in any of the articles. This set is called the vocabulary and we let V be the number of words in the vocabulary. For each article, we produced a feature vector X = (X,..., Xv), where if the ith word appears in the article otherwise X = 0, and each article is also accompanied by a class label yi, where 1, for The Economist 2, for The Onion Yi = When we apply the Naive Bayes classification algorithm, we make two assumptions about the data: first, we assume that our data is drawn iid from a joint probability distribution over the possible feature vectors X and the corresponding class labels Y; second, we assume for each pair of features X, and X, with i j that X, is conditionally independent of X, given the class label Y (this is the Naive Bayes assumption). Under these assumptions, a natural classification rule is as follows: Given a new input X, predict the most probable class label given X. Formally, Using Bayes Rule and the Naive Bayes assumption, we can rewrite this classification rule as follows: P(X|Y = y)P(Y = y) P(X) (Bayes Rule) (Denominator does not depend on y) = argmax Y = argmax P(X|Y = y)P(Y = y) Y Y = argmax P(Y = y|X). Y = argmax P(X,..., Xv|Y = y)P(Y = y) Y = argmax Y V IP(Xw|Y: = y)) P(Y (Conditional independence). w=1 Of course, since we don't know the true joint distribution over feature vectors X and class labels Y, we need to estimate the probabilities P(X|Y = y) and P(Y = y) from the training data. For each word index w {1, ..., V} and class label y {1,2}, the distribution of X given Y = y is a Bernoulli distribution with parameter yw. In other words, there is some unknown number Oyw such that P(Y = y) P(X = 1|Y = y) = Oyw P(Xw = 0|Y = y) = 1 - 0yw. We believe that there is a non-zero (but maybe very small) probability that any word in the vocabulary can appear in an article from either The Onion or The Economist. To make sure that our estimated probabilities are always non-zero, we will impose a Beta (2,1) prior on yw and compute the MAP estimate from the training data. Similarly, the distribution of Y (when we consider it alone) is a Bernoulli distribution with parameter p. In other words, there is some unknown number p such that P(Y = 1) = p P(Y=2) = 1 p. In this case, since we have many examples of articles from both The Economist and The Onion, there is no risk of having zero-probability estimates, so we will instead use the MLE for the distribution of Y. Questions 1. What would be the MAP estimate of Oyw = P(Xw = 1|Y = y) with a Beta(2,1) prior distribution? 2. What would be the MLE estimate for the prior, p = P(Y = 1)? [14 points] [8 points] [4 points] 3. Given Oyu, what would be the value of P(XW|Y = y)? 4. How would you classify a test example? Write the classification rule equation for a new test sample Xtest using yw and p? [8 points] 5. How do you think the train and test error would compare? Explain any significant differences. Hint: You can try to implement Naive Bayes for this question using python libraries and by finding a similar dataset. [4 points] 6. If we have less training data for the same problem, does the prior have more or less impact on our classifier? Explain any possible difference between the train and test error in this question and when we used the whole data. [6 points]

Step by Step Solution

★★★★★

3.45 Rating (164 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Pastina Company sells various types of pasta to grocery chains as private label brands. The company\'s reporting year - end is December 3 1 . The unadjusted trial balance as of December 3 1 , 2 0 2 1...

Assignment 5: Hash Table implementation andconcordance There are three parts to this assignment. In the first two parts,you will complete the implementation of a hash map and aconcordance program. In...

Plot the complex number and find its absolute value. 19-i Plot the complex number on the complex plane to the right. The absolute value of the complex number is |19-= (Simplify your answer. Type an...

Make a presentation about marketing: Your Companys marketing department promotes the products and interacts with the customers, sales force, and supply chain. They are also in charge of forecasting...

Wireless identity theft is a technique of stealing an individual's personal information from radio-frequency-enabled cards (e.g., credit or debit cards). Upon capturing this data, thieves are able to...

=+c) Suggest additional factors that may influence pizza sales not accounted for in this exercise.

=+c) If were primarily interested in Pedros effect on attendance, why is it important to have the other variables in the model?

The 2018 balance sheets and income statement for Netflix Inc. follow. Refer to these financial statements to answer the requirements. NETFLIX INC. Consolidated Statement of Earnings For Year Ended...

#7 ? 0 G ?https:/ #70 Help Seve &Exit Submit Check my work 14 Trend Une nc. has been growing at a rate of 6% per year and is expected to contin e to do so expected to be $6 per share. de nr . The e...

The Tokyo Olympics After watching How the Tokyo Olympics Became the Most Expensive Summer Game Ever video answer the following questions. * * The numbers can be made up . I just need help with an...

Requlred Informetion Use the following information for the Exercises below. (Algo) [The following information applies to the questions displayed below.] The first production department of Stone...

Walk & Talk Inc. makes walkie talkies - handheld, portable, two-way radio transceivers. They are used in setting where mobile phone communication may be costly or impractical, including outdoor...

Question 17 Assume that a firm has EBIT of $15,584,650, a tax rate of 40.0 percent, total invested capital of $68,600,000, and a weighted average cost of capital (WACC) of 12.904 percent. Given this...

How do the following situations illustrate the features of psychological egoism and/or ethical egoism, or how do they illustrate the limitations and challenges to psychological egoism and ethical...

differentiate the function ( x + 1 ) / ( x ^ 3 + x - 6 )

For a voltage V-210 sin 170xt, find the root-mean-square voltage for one period. Vrms = (Round to the nearest integer as needed.)

On June 10, Sunland Company sold $10,000 of merchandise to Cheyenne Company, FOB shipping point, terms 1/10, n/30. The merchandise purchased by Cheyenne on June 10 cost Sunland $4,500. Cheyenne pays...

If a force of F = 50 Ib is applied to the pads at A and C, determine the smallest dimension d required for equilibrium if the spring has an unstretched length of 1 ft. B 1 ft 1 ft F k = 15016/fr 1ft...

Consider the claim that the OCA criteria are self-fulfilling. In what ways might this benefit a country that joins the currency union even if it doesnt satisfy the OCA criteria before joining? In...

Table 3-1(14-1) in the text shows the percentage undervaluation or overvaluation in the Big Mac, based on exchange rates in July 2019. Go to the main data repository for the BigMac dataset at...

What provision of U.S. trade law was used by President Trump to apply a tariff on steel and aluminum? What provision of U.S. trade was used to justify the tariffs on goods imported from China? Do...

Suppose that for health reasons, the government of the nation of Ironia wants to increase the amount of broccoli citizens consume. Which of the following policies could be used to achieve the goal?...

The following scenarios describe the price elasticity of supply and demand for a particular good. In which scenario will a subsidy increase consumption the most? Choose only one. a. Elastic demand,...

Based on Table 5P-1, calculate consumer surplus for each consumer when the price is $17. What is the total consumer surplus at this price?