Question: Question 2 . ( 2 0 Points ) Bob would like to classify emails as spam or non - spam. He would like to estimate

Question 2.(20 Points)
Bob would like to classify emails as spam or non-spam. He would like to estimate the probability that a new email e containing the keywords (w1,w2,dots,wn) is spam by taking all the emails in the training set with those keywords, and then computing the proportion of those emails that are spam. Specifically, he estimates the probability using
P( spam | new email e)=num.ofspamemailswithkeywordsw1,w2,dots,wninthetrainingsetnum.oftotalemailswithkeywordsw1,w2,dots,wninthetrainingset.
Part (a) Explain why Bob's plan will generally not work. (5 Points)
Part (b) Describe the datasets for which Bob's plan might work. Be specific: state which properties are required of the datasets. (5 Points)
Question 2 . ( 2 0 Points ) Bob would like to

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!