Classify SMS messages as spam or not spam. The training example are in a Numpy array called
Question:
Classify SMS messages as spam or not spam.
The training example are in a Numpy array called X train and their corresponding labels are in a Numpy array called y train.
The test examples are in a Numpy array called X test and their corresponding labels are in a Numpy array called y test.
To do: Implement Bernoulli Naive Bayes in Python.
For each test example (x1, . . . , xd), you will need to find the class maximizing p(x1|y) p(x2|y) . . . p(xd|y) P(y) using the estimates of the p(xj |y) and p(y).
Training details: During the training phase, you will use the examples in the training file to estimate the value of P(y) for each class (y {0, 1}).
For each pair xj , y of feature and class, you will need to estimate the probability p(xj |y).
If there is a tie, give the example the label 1.
Questions
(a) What was the estimated value of P(y) for y = 1?
(b) What was the estimated value of P(y) for y = 0?
(c) What were the estimated values for admirer|y for the corresponding feature admirer when y = 1 and for y = 0.
(d) What were the estimated values for secret|y for the corresponding to feature secret when y = 1 and for y = 0.
(e) Which classes were predicted for the first 5 examples in the test set?
(f) Which classes were predicted for the last 5 examples in the test set?
(g) What was the percentage error on the examples in the test file?
(h) Repeat the above step (question 4g) by adding m smoothing by trying different values of m where 0 < m 1. Did the smoothing help? If so, for what value of m? (Do not add smoothing to the prior.)
(i) Sometimes a not-very-intelligent learning algorithm can achieve high accuracy on a particular learning task simply because the task is easy. To check for this, you can compare the performance of your algorithm to the performance of some very simple algorithms. One such algorithm just predicts the majority class (the class that is most frequent in the training set). This algorithm is sometimes called Zero-R. It can achieve high accuracy in a 2-class problem if the dataset is very imbalanced (i.e., if the fraction of examples in one class is much larger than the fraction of examples in the other). What accuracy is attained is you use Zero-R instead of Bernoulli Naive Bayes? (Note that Bernoulli Naive Bayes can sometimes be effective even if the assumptions are not very reasonable. In order to do correct classification, it is enough to determine the correct MAP class. It is not necessary to actually compute the correct posterior probability P(C|x) for each class.)
Estimation of P(y): To estimate the P(y) values, just calculate the fraction of the training examples that are in class y.
Testing details: For each example (x1, . . . , xd) in the X test and the labels in y test, you want to determine which class maximizes
p(x1|y) p(x2|y) . . . p(xd|y) P(y)