Question: This section of the homework will walk you through coding a Naive Bayes classifier that can distinguish between postive and negative reviews (at some level

This section of the homework will walk you through coding a Naive Bayes classifier that can distinguish between postive and negative reviews (at some level of accuracy).

Question 2.1 (5 pts) To start, implement the update_model function in hw_1.py. Make sure to read the function comments so you know what to update. Also review the NaiveBayes class variables in the def __init__ method of the NaiveBayes class to get a sense of which statistics are important to keep track of. Once you have implemented update_model, run the train model function using the code below. Youll need to provide the path to the dataset you downloaded to run the code.

In [ ]:

nb = NaiveBayes(PATH_TO_DATA, tokenizer=tokenize_doc) nb.train_model() if len(nb.vocab) == 252165: print "Great! The vocabulary size is {}".format(252165) else: print "Oh no! Something seems off. Double check your code before continuing. Maybe a mistake in update_model?"

Exploratory analysis

Lets begin to explore the count statistics stored by the update model function. Use the provided top_n function to find the top 10 most common words in the positive class and top 10 most common words in the negative class. You don't have to code anything to do this.

In [ ]:

print "TOP 10 WORDS FOR CLASS " + POS_LABEL + ":" for tok, count in nb.top_n(POS_LABEL, 10): print '', tok, count print '' print "TOP 10 WORDS FOR CLASS " + NEG_LABEL + ":" for tok, count in nb.top_n(NEG_LABEL, 10): print '', tok, count print ''

Question 2.2 (5 points)

Will the top 10 words of the positive/negative classes help discriminate between the two classes? Do you imagine that processing other English text will result in a similar phenomenon?

Answer in one or two sentences here.

Question 2.3 (5 pts)

The Naive Bayes model assumes that all features are conditionally independent given the class label. For our purposes, this means that the probability of seeing a particular word in a document with class label y is independent of the rest of the words in that document. Implement the p_word_given_label function. This function calculates P (w|y) (i.e., the probability of seeing word w in a document given the label of that document is y).

Use your p_word_given_label function to compute the probability of seeing the word fantastic given each sentiment label. Repeat the computation for the word boring.

In [ ]:

print "P('fantastic'|pos):", nb.p_word_given_label("fantastic", POS_LABEL) print "P('fantastic'|neg):", nb.p_word_given_label("fantastic", NEG_LABEL) print "P('boring'|pos):", nb.p_word_given_label("boring", POS_LABEL) print "P('boring'|neg):", nb.p_word_given_label("boring", NEG_LABEL)

Which word has a higher probability given the positive class, fantastic or boring? Which word has a higher probability given the negative class? Is this what you would expect?

Answer in one or two sentences here

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Need help getting started on these questions. I am supposed to add code where it says "implement me" and write the answer where it says answer in one or two line. Need to fill in the "Implement me"...

Answer three questions below on the article. 1- Racial slur was mentioned in the article. Provide a definition for this term. And analyze how the association racial slur was related to the online...

175 Homework B Section 6.3 (Part II) 1 of 3 https://www.webassign.net/web/Student/Assignment-Responses/last?dep... WebAssign 175 Homework B Section 6.3 (Part II) (Homework) Current Score : - / 5...

G mathxl.com C MInbox - dana.fabro.040... Untitled Do Homework - Section... P Do Homework - Section... Ics) Message Board Course Hero STAT 250 Chapter 2 Qui.. Math 141 - 21107 Dana Fabro |...

Question 1, 4.5.3 > HW Score: 59.35%, 13.65 of 23 points Part 3 of 3 @ Points: 0.6 of 1 _ Homework: Section 4.5 HW Homework: Section 4.5 HW Question list l6 By inspecting the graph of the function,...

Instructions for Emailing Homework - Please follo 1. Do NOT send your homework to my email address. Excel homework must be emailed to huxleyqba time and due date shown on the homework. Late homework...

Question 4, 4.6.19-BE > HW Score: 65%, 6.5 of 10 points Part 1 of 4 (3) Points: 0 of 1 A company manufactures and sells x cellphones per week. The weekly price-demand and cost equations are given...

6 Chrome File Edit View History Bookmarks Proles Tab Window Help . A l ' /- FriJulZS 3:52PM . WebAssign Homework Chapter 8 Section 3 Homex X * Homework Help Q&A from X + o 6 C 0 webassignnet *' (0'...

Link to CSV file https://drive.google.com/open?id=1zl7fuHY2irMebaG4R_DqlWab5TQTo6AV In this assignment, you will work with a data file (cars-missing.csv) that includes records with missing values....

Discuss the differences among sex-influenced, sex-limited, and sex-linked inheritance. Give examples.

Recording Stock Issuances with Par Value In a recent year, Coach, Inc., a designer and marketer of handbags and other accessories, issued 10,200 shares of its $0.01 par value stock for $43,000 (these...

XYZ ' s stock price is S 1 0 / share . In one year, the price will either rise to $ 1 4 ( 5 0 % probability ) or fall to S 9 . XYZ will not pay any dividends. The riskless interest rate is 2 0 % ( i...

Compared with half a century ago, adoption has become _ _ _ _ _ _ _ _ _ common, but it is more open and acceptabl e , so we probably discuss it _ _ _ _ _ _ _ . fill in the blanks more or much less or...

Explain the purpose of the Project Charter and its relationship to Management Approval for a Project.

What is Change Control and how does it operate?

How do Data Requirements relate to Functional Requirements?