Question: Decision Tree, post - pruning and cost complexity parameter using sklearn 0 . 2 2 [ 1 0 points, Peer Review ] We will use
Decision Tree, postpruning and cost complexity parameter using sklearn points, Peer Review
We will use a preprocessed natural language dataset in the CSV file "spamdata.csv to classify emails as spam or not. Each row contains the word frequency for words plus statistics on the longest "run" of captial letters.
Word frequency is given by:
Where
is the frequency for word
is the number of times word
appears in the email, and
is the total number of words in the email.
We will use decision trees to classify the emails.
Part A points: Complete the function getspamdataset to read in values from the dataset and split the data into train and test sets.
My Code:
def getspamdatasetfilepath"dataspamdatacsv testsplit:
getspamdataset
Loads csv file located at "filepath". Shuffles the data and splits
it so that the you have testsplit training examples and
testsplit testing examples.
Args:
filepath: location of the csv file
testsplit: percentage of the data should be the testing split
Returns:
Xtrain, Xtest, ytrain, ytest, featurenames
Note: featurenames is a list of all column names including isSpam.
in that order
first four are npndarray
# your code here
# Read CSV file
data pdreadcsvfilepath headerNone, delimiter
# Shuffle the data
data data.samplefrac randomstateresetindexdropTrue
# Extract features and target variable
X data.iloc: :values
y data.iloc:values
# Split the data into train and test sets
Xtrain, Xtest, ytrain, ytest traintestsplitX y testsizetestsplit, randomstate
# Get feature names
featurenames fwordfreqi for i in range Xshape
return Xtrain, Xtest, ytrain, ytest, featurenames
# TODO: import the data set into five variables: Xtrain, Xtest, ytrain, ytest, labelnames
# Uncomment and edit the line below to complete this task.
testsplit # default testsplit; change it if you'd like; ensure that this variable is used as an argument to your function
# your code here
Xtrain, Xtest, ytrain, ytest, labelnames getspamdatasetfilepath"dataspamdatacsv testsplit
# Xtrain, Xtest, ytrain, ytest, labelnames nparange
# Print the shapes of Xtrain and ytrain
printShape of Xtrain:", Xtrain.shape
printShape of ytrain:", ytrain.shape
# Print labelnames
printLabel names:", labelnames
its returning wrong answer can someone help.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
