Question: Q. (Naive Bayes ) From Project Gutenberg, we downloaded two files: The Adventures of Sherlock Holmes by Arthur Conan Doyle (pg1661.txt) and The Complete Works

Q. (Naive Bayes) From Project Gutenberg, we downloaded two files: The Adventures of Sherlock Holmes by Arthur Conan Doyle (pg1661.txt) and The Complete Works of Jane Austen(pg31100.txt). Please develop a multinomial Naive Bayes Classifier that will learn to classify the authors from a snippet of text into: Conan Doyle or Jane Austen. A multinomial Naive Bayes uses a feature vector x = {x 1 , ..., x D } as a histogram and model the posterior probability as:

p(Ck |x) p(Ck ) Q. (Naive Bayes) From Project Gutenberg, we downloaded two files: The Adventures(i=1 to D) p(xi |Ck )

where p(xi |Ck ) can be estimated by the number of times word i was observed in class Ck plus a smoothing factor divided by the total number of words in Ck In the testing phase, given a new example xt , you can output the class assignment for this example by comparing log p(C2 |xt ) and log p(C2 |xt ). If log p(C2 |xt ) > log p(C1 |xt ), assign C 2 to this example. You need to divide the data into training and testing. Make sure the testing data has equal number of samples from Conan Doyle and Jane Austen. Report accuracy on test data using your Naive Bayes classifier. Please do not use any package/tool.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!