Question: Q. (Naive Bayes ) From Project Gutenberg, we downloaded two files: The Adventures of Sherlock Holmes by Arthur Conan Doyle (pg1661.txt) and The Complete Works
Q. (Naive Bayes) From Project Gutenberg, we downloaded two files: The Adventures of Sherlock Holmes by Arthur Conan Doyle (pg1661.txt) and The Complete Works of Jane Austen(pg31100.txt). Please develop a multinomial Naive Bayes Classifier that will learn to classify the authors from a snippet of text into: Conan Doyle or Jane Austen. A multinomial Naive Bayes uses a feature vector x = {x 1 , ..., x D } as a histogram and model the posterior probability as:
p(Ck |x) p(Ck )
(i=1 to D) p(xi |Ck )
where p(xi |Ck ) can be estimated by the number of times word i was observed in class Ck plus a smoothing factor divided by the total number of words in Ck In the testing phase, given a new example xt , you can output the class assignment for this example by comparing log p(C2 |xt ) and log p(C2 |xt ). If log p(C2 |xt ) > log p(C1 |xt ), assign C 2 to this example. You need to divide the data into training and testing. Make sure the testing data has equal number of samples from Conan Doyle and Jane Austen. Report accuracy on test data using your Naive Bayes classifier. Please do not use any package/tool.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
