Question: Problem 2 : Decision Tree, post - pruning and cost complexity parameter using sklearn 0 . 2 2 We will use a pre - processed
Problem : Decision Tree, postpruning and cost complexity parameter using sklearn We will use a preprocessed natural language dataset in the CSV file "spamdata.csv to classify emails as spam or not. Each row contains the word frequency for words plus statistics on the longest "run" of captial letters. Word frequency is given by: fimiN Where fi is the frequency for word i mi is the number of times word i appears in the email, and N is the total number of words in the email. We will use decision trees to classify the emails. TO DO : Complete the function getspamdataset to read in values from the dataset and split the data into train and test sets. def getspamdatasetfilepath"dataspamdatacsv testsplit: getspamdataset Loads csv file located at "filepath". Shuffles the data and splits it so that the you have testsplit training examples and testsplit testing examples. Args: filepath: location of the csv file testsplit: percentage of the data should be the testing split Returns: Xtrain, Xtest, ytrain, ytest, featurenames in that order first four are npndarray # complete your code here return TO DO : Import the data set into five variables: Xtrain, Xtest, ytrain, ytest, labelnames # Uncomment and edit the line below to complete this task. testsplit # default testsplit; change it if you'd like; ensure that this variable is used as an argument to your function # your code here # Xtrain, Xtest, ytrain, ytest, labelnames nparange TO DO : Build a decision tree classifier using the sklearn toolbox. Then compute metrics for performance like precision and recall. This is a binary classification problem, therefore we can label all points as either positive SPAM or negative NOT SPAM def builddtdataX datay maxdepth None, maxleafnodes None: This function builds the decision tree classifier and fits it to the provided data. Arguments dataX a npndarray datay npndarray maxdepth None if unrestricted, otherwise an integer for the maximum depth the tree can reach. Returns: A trained DecisionTreeClassifier # complete your code here
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
