Question: 1.We would like to use machine learning to identify email spam so we can send spam directly to the spam folder rather than the user's

1.We would like to use machine learning to identify email spam so we can send spam directly to the spam folder rather than the user's inbox.Machine Learning Model 1 (MLM1) was built to classify emails as either spam vs. useful ("normal") email.A dataset of 4,601 emails described through 57 features, such as text length and presence of specific words like "buy", "subscribe", and "win" was used for the model. Output for the "Spam" column provides two possible labels for the emails: "spam" and "normal".80% of the emails (3,680 emails) were randomly used to train the model and 20% (921 emails) were withheld for evaluating the model.To evaluate the classification model we compare the actual and predicted target column values in the test set. The whole scoring process of a model consists of a match count: how many data rows have been correctly classified and how many data rows have been incorrectly classified by the model. These counts are summarized in the confusion matrix as follows:

MLM1 Results

Spam (Predicted)Useful (Predicted)Total

Spam(Actual)32043363

Useful(Actual)20538558

Total340581921

There are a number of measures that can be calculated to determine the value of our model.Note that since we are trying to predict spam, the "positive class" is the spam class, while the "negative class" is the useful class.

Error Rate measuresthe number of all incorrect predictions divided by the total number of the dataset.

Error Rate = (FP+FN)/(FP+FN+TP+TN).

Accuracyis calculated as the number of all correct predictions divided by the total number of the dataset.

Accuracy = (TP+TN)/(FP+FN+TP+TN)

Sensitivity(also calledRecall) measures how good the model is at detecting events in the positive class, that is how many of the actual spam are correctly predicted as spam.

Sensitivity = TP/(TP+FN)

Specificitymeasures how exact the assignment to the negative class is, that is how many of the predicted useful emails actually are useful emails.

Specificity = TN/(TN+FP).

Precision(also called thePositive Predictive Value) measures the number of correct positive predictions divided by the total number of positive predictions.

Precision = TP/(TP+FP).

Calculate these five measures for MLM1.

Machine Learning Model 2 (MLM2) was also evaluated with a different randomly chosen 80% of the emails to train the model and 20% to evaluate the model.MLM2 produced the following confusion matrix:

MLM2 Results

Spam (Predicted)Useful (Predicted)Total

Spam(Actual)33423357

Useful(Actual)40524564

Total374547921

Calculate the same five measures for MLM2.Discuss which model you believe is better and explain why.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!

I have to create a program in C and I can't figure it out. The program has to read a source file. Please help. /******************************************************************** PROJECT: Glossary...

There are two problems due this week (each worth 35 points) as follows. Case 5-1David L. Miller: Portrait of a White-Collar Criminal (page 144). In comprehensive paragraphs, answerrequirements 1?6....

How does the article Fixing Facebook: Fake news, privacy, and platform governance relate to the ted talk video what obligations do social media platforms have to the greater good? Ted talk video...

Page 1 of 20 Print Publication Date: Jul 2017 Subject: Law, IT and Communications Law Online Publication Date: Dec 2016 DOI: 10.1093/oxfordhb/9780199680832.013.45 Gregory N. Mandel The Oxford...

Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...

I would like assistance with assignment 3 and 4 on the attached document I have been struggling with the subject and its my last AUI4863/102/0/2016 Tutorial letter 102/0/2016 ADVANCED INTERNAL AUDIT...

Read the above passage and then answer short questions Summarize and elaborate the research method of this article in concise language Application Research Based on Machine Learning in Network...

Chapter 2 User-Centered Systems Design: A Brief History Abstract The intention of this book is to help you think about design from a user-centered perspective. Our aim is to help you understand what...

ITM 309: Business Information Technology and Systems Spring 2016 Watson and the new era of cognitive systems Jerry Haan IBM Cloud Ecosystem Development January 27, 2016 2013 International Business...

see case to answer question only you don't need no other reference. Case Overview Founded by Jeff Bezos, online giant Amazon.com, Inc. (Amazon), was incorporated in the state of Washington in July,...

AussiBlast, an Australian gold mining company, has several mining fields in the African countries. They decided to issue bond denominated in pound sterling to raise finance from London for one of its...

Mercedes-Benz is thinking about advertising its cars to college students. Do you think that college students are a viable potential market for Mercedes? Why or why not?

The Evensky & Katz Cash Flow Reserve Strategy is an example of which income bucketing strategy? Question 3 2 options: Tax Status Product Type Asset Class Goals

Financial Statements of a Manufacturing Firm The following events took place for Sorensen Manufacturing Company during January, the first month of its operations as a producer of digital video...