Question: In this assignment, we will build and evaluate a spam filter using a dataset that contains some columns indicating the most common words in an

In this assignment, we will build and evaluate a spam filter using a dataset that contains some columns indicating the most common words in an email (frequency of given words and characters), and a label column indicating if the email was spam or not. Please answer the following questions based on your implemented code (implementation in Matlab):

a) Draw a bar chart to view of the distribution of spam and non-spam email samples in the dataset. How many emails are in the dataset? How many of the emails are spam?

b) Divide the dataset into training and test sets, since this is a binary classification problem, use a Logistic regression or Random Forest algorithm to build a model that can tell whether an email is spam or not.

c) Build the confusion matrix and calculate precision and recall metrics to evaluate the performance of your model.

d) Take another look at the distribution of sample emails (i.e. part a). Are there any imbalances in the distribution? If yes, oversample the minority class using SMOTE algorithm and retrain your model.

e) Rebuild the confusion matrix and compare it with your initial matrix. What are the differences between these models? Does SMOTE work well? Explain your answer in detail

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

From the book Networks, Crowds, and Markets: Reasoning about a Highly Connected World. By David Easley and Jon Kleinberg. Cambridge University Press, 2010. Complete preprint on-line at...

This text was adapted by The Saylor Foundation under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License without attribution as requested by the work's original creator or licensee. 1...

1 Ob jective Construct a na ve Bayes classifier to classify email as spam or not spam ("ham"). A Bayesian decision rule chooses the hypothesis that maximizesP(Spam|x) vsP(Spam|x) for emailx. Use any...

NOTE: THIS IS FROM "DISCRETE MATH" COURSE FOR COMPUTER SCIENCE I RECOMMEND YOU TO DO THIS ASSIGNMENT ON VISUAL STUDIO SINCE I HAVE NEVER TAKING C++, I MAY HAVE SOME DIFFICULTY FOR THIS ASSIGNMENT....

Your code will read in an email message in some standard format (we will determine that standard) and will classify whether that email is a spam or non-spam email. There are databases containing...

Need help getting started on these questions. I am supposed to add code where it says "implement me" and write the answer where it says answer in one or two line. Need to fill in the "Implement me"...

MNG3702/101/3/2016 Tutorial Letter 101/3/2016 Strategy Implementation and Control MNG3702 Semesters 1 and 2 Department of Business Management PLEASE NOTE: This tutorial letter contains important...

Avenger Inc. reported the following data: Net income ............. $270,000 Depreciation expense......... 30,000 Gain on disposal of equipment.... 24,600 Decrease in accounts receivable... 16,800...

Multiple Choice Questions Identify the best answer for each of the following: 1. General budgets are most common for which of the following funds? a. General Fund. b. Capital Projects Fund. c....

A stop - loss order ( or a sell stop order ) : is a sell order placed at the ongoing current ask price is a sell order placed at a price lower than the current bid price None of these answers are...

which part of planning a metering is most lacking

d. Prizes. Employees who achieve weekly sales goals established by management would be eligible for prizes such as sports or theater tickets, dinner at a nice restaurant, gift certificates, or...

2. Some leaders in business and industry say that worker pride is the byproduct of achievement. What are your thoughts? What factors constitute achievement at your local McDonalds restaurant? Your...

2. If you decide to create your own brand, what personal qualities will give you greater visibility, recognition, and acceptance in the labor market? These qualities should send the message, Pick me;...