Question: Nave Bayes Classifier Overview: In this assignment we will classify an email as spam or legit. Specifically emails from enron that have been made publicly

Nave Bayes Classifier Overview:

In this assignment we will classify an email as spam or legit. Specifically emails from enron that have been made publicly available have been recoded into a term-document matrix that shows each email and words that appear (something well learn more about in text mining). I have provided you code to get started. Please follow the directions below (and code) to create a nave bayes classifier model to predict an email as legit or spam. 1. Download the email.csv from canvas and import to a dataset named email 2. Review the term document matrix, how many columns does it have? 3. Run the following code to see the top words used in the spam email spam_df = email_df.loc[email_df['message_label'] == 'spam'] spam_totals = spam_df.groupby('message_label').sum() spam_totals = spam_totals.drop('message_index', axis=1) spam_totals.T.sort_values(by='spam', ascending=False).head(10) 4. Modify the same code in 3 to identify the top words in legit emails. What is the top appearing word in a spam email? What is the top appearing word in a legit email? 5. Convert the dependent variable message label to a 1,0 categorical outcome. 6. Run the following code to transform the binary classification into categorical variables word_list = email_df.columns for col in [word_list]: email_df[col] = email_df[col].astype('category') 7. Split the data into 75% training and 25% validation using random_state = 2 and stratify = y 8. Is there a proportion imbalance? 9. Create a nave bayes classifier to predict message_label on the training set 10. Using the predict function and the confusion matrix/classification report, what is the overall accuracy of the model on the training and validation sets?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!