Question: Problem 2: Decision Tree, post-pruning and cost complexity parameter using sklearn 0.22 We will use a pre-processed natural language dataset in the CSV file spamdata.csv

Problem 2: Decision Tree, post-pruning and cost complexity parameter using sklearn 0.22

We will use a pre-processed natural language dataset in the CSV file "spamdata.csv" to classify emails as spam or not. Each row contains the word frequency for 54 words plus statistics on the longest "run" of captial letters.

Word frequency is given by: =/

Where is the frequency for word , is the number of times word appears in the email, and is the total number of words in the email. We will use decision trees to classify the emails.

TO DO 1: Complete the function get_spam_dataset to read in values from the dataset and split the data into train and test sets.

def get_spam_dataset(filepath="data/spamdata.csv", test_split=0.1): ''' get_spam_dataset Loads csv file located at "filepath". Shuffles the data and splits it so that the you have (1-test_split)*100% training examples and (test_split)*100% testing examples. Args: filepath: location of the csv file test_split: percentage/100 of the data should be the testing split Returns: X_train, X_test, y_train, y_test, feature_names (in that order) first four are np.ndarray ''' # complete your code here return 0

TO DO 2: Import the data set into five variables: X_train, X_test, y_train, y_test, label_names # Uncomment and edit the line below to complete this task.

test_split = 0.1 # default test_split; change it if you'd like; ensure that this variable is used as an argument to your function # your code here

# X_train, X_test, y_train, y_test, label_names = np.arange(5)

TO DO 3: Build a decision tree classifier using the sklearn toolbox. Then compute metrics for performance like precision and recall. This is a binary classification problem, therefore we can label all points as either positive (SPAM) or negative (NOT SPAM).

def build_dt(data_X, data_y, max_depth = None, max_leaf_nodes =None): ''' This function builds the decision tree classifier and fits it to the provided data. Arguments data_X - a np.ndarray data_y - np.ndarray max_depth - None if unrestricted, otherwise an integer for the maximum depth the tree can reach. Returns: A trained DecisionTreeClassifier ''' # complete your code here

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Problem 2 : Decision Tree, post - pruning and cost complexity parameter using sklearn 0 . 2 2 We will use a pre - processed natural language dataset in the CSV file "spamdata.csv " to classify emails...

Problem 2: Decision Tree, post-pruning and cost complexity parameter using sklearn 0.22 [10 points, Peer Review] We will use a pre-processed natural language dataset in the CSV file "spamdata.csv" to...

Why this comment still shows? One or more test cases in this cell did not pass.Instructor hints: 1. "For Problem 2, Part A, look at the shape of X_train."2. "For Problem 2, Part A, look at...

Problem 2 : Decision Tree, post - pruning and cost complexity parameter using sklearn 0 . 2 2 [ 1 0 points, Peer Review ] o use a pre - processed natural language dataset in the CSV file "spamdata...

Decision Tree, post - pruning and cost complexity parameter using sklearn 0 . 2 2 [ 1 0 points, Peer Review ] We will use a pre - processed natural language dataset in the CSV file "spamdata.csv " to...

Problem 2 : Decision Tree, post - pruning and cost complexity parameter using sklearn 0 . 2 2 [ 1 0 points, Peer Review ]

this is decision analytics 23.37 learning.up.edu C 1.0.f.2 Decision Analysis Homework - Spring 2020 Name your word document with the solution as follows: your last name_first name_HW5.docx. Don't...

Problem 2 Decision Tree Finance Internships ( 1 5 % ) As a junior at THE Ohio State University, Fisher College of Business, majoring in Finance, you re gearing up for your upcoming summer plans. Your...

Problem 2 Decision Tree problem: (Note: For the drawing, you can use a software or do it by hand.) (40pt) Your friend is a construction businessman and is asking you for advice. He can build a...

Jennifer needs to make some house repairs in four years that will cost $9,000. She has some money in an account earning 7% annual interest. How much money needs to be in the account today so she will...

Jane Woodsman has shown astonishing progress in academic achievement over the last semes ter. Jane's average grade has increased by a substantial 20%."Based on this statement, what is your impression...

9. What is the purpose of the Monitoring Board?

5. Develop a scenario comparing two PH programs and involving the use of a CBA.

Teamwork. Ethics. Form groups with four or five students in each. Each member of the group will choose a specific cultural dimension from among the following: ethnicity, race, gender, age, or...

Teamwork. Form groups of four to seven people. Discuss the importance of diversity initiatives in businesses. Plan an agenda for a seminar that could help people in a business understand the needs...

Technology. Global. Ethics. Interview a student, a businessperson, or a visiting lecturer who is a native of another country or who has spent extensive time in a particular country other than the...