Question: 1.We would like to use machine learning to identify email spam so we can send spam directly to the spam folder rather than the user's

1.We would like to use machine learning to identify email spam so we can send spam directly to the spam folder rather than the user's inbox.Machine Learning Model 1 (MLM1) was built to classify emails as either spam vs. useful ("normal") email.A dataset of 4,601 emails described through 57 features, such as text length and presence of specific words like "buy", "subscribe", and "win" was used for the model. Output for the "Spam" column provides two possible labels for the emails: "spam" and "normal".80% of the emails (3,680 emails) were randomly used to train the model and 20% (921 emails) were withheld for evaluating the model.To evaluate the classification model we compare the actual and predicted target column values in the test set. The whole scoring process of a model consists of a match count: how many data rows have been correctly classified and how many data rows have been incorrectly classified by the model. These counts are summarized in the confusion matrix as follows:

MLM1 Results

Spam (Predicted)Useful (Predicted)Total

Spam(Actual)32043363

Useful(Actual)20538558

Total340581921

There are a number of measures that can be calculated to determine the value of our model.Note that since we are trying to predict spam, the "positive class" is the spam class, while the "negative class" is the useful class.

Error Rate measuresthe number of all incorrect predictions divided by the total number of the dataset.

Error Rate = (FP+FN)/(FP+FN+TP+TN).

Accuracyis calculated as the number of all correct predictions divided by the total number of the dataset.

Accuracy = (TP+TN)/(FP+FN+TP+TN)

Sensitivity(also calledRecall) measures how good the model is at detecting events in the positive class, that is how many of the actual spam are correctly predicted as spam.

Sensitivity = TP/(TP+FN)

Specificitymeasures how exact the assignment to the negative class is, that is how many of the predicted useful emails actually are useful emails.

Specificity = TN/(TN+FP).

Precision(also called thePositive Predictive Value) measures the number of correct positive predictions divided by the total number of positive predictions.

Precision = TP/(TP+FP).

Calculate these five measures for MLM1.

Machine Learning Model 2 (MLM2) was also evaluated with a different randomly chosen 80% of the emails to train the model and 20% to evaluate the model.MLM2 produced the following confusion matrix:

MLM2 Results

Spam (Predicted)Useful (Predicted)Total

Spam(Actual)33423357

Useful(Actual)40524564

Total374547921

Calculate the same five measures for MLM2.Discuss which model you believe is better and explain why.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!