Question: Decision Tree, post - pruning and cost complexity parameter using sklearn 0 . 2 2 [ 1 0 points, Peer Review ] We will use

Decision Tree, post

-

pruning and cost complexity parameter using sklearn

0.22 [10

points, Peer Review

]

We will use a pre

-

processed natural language dataset in the CSV file "spamdata.csv

"

to classify emails as spam or not. Each row contains the word frequency for

54

words plus statistics on the longest "run" of captial letters.

Word frequency is given by:

= /

Where

is the frequency for word

,

is the number of times word

appears in the email, and

is the total number of words in the email.

We will use decision trees to classify the emails.

Part A

[5

points

]

: Complete the function get

_

spam

_

dataset to read in values from the dataset and split the data into train and test sets.

My Code:

def get

_

spam

_

dataset

(

filepath

=

"data

/

spamdata

.

csv

",

test

_

split

= 0.1)

'''

get

_

spam

_

dataset

Loads csv file located at "filepath". Shuffles the data and splits

it so that the you have

(1 -

test

_

split

) * 100 %

training examples and

(

test

_

split

) * 100 %

testing examples.

Args:

filepath: location of the csv file

test

_

split: percentage

/ 100

of the data should be the testing split

Returns:

_

train, X

_

test, y

_

train, y

_

test, feature

_

names

Note: feature

_

names is a list of all column names including isSpam.

(

in that order

)

first four are np

.

ndarray

'''

# your code here

# Read CSV file

data

=

.

read

_

csv

(

filepath

,

header

=

None, delimiter

='')

# Shuffle the data

data

=

data.sample

(

frac

= 1,

random

_

state

= 42) .

reset

_

index

(

drop

=

True

)

# Extract features and target variable

=

data.iloc

[

,

- 1] .

values

=

data.iloc

[

, - 1] .

values

# Split the data into train and test sets

_

train, X

_

test, y

_

train, y

_

test

=

train

_

test

_

split

(

,

,

test

_

size

=

test

_

split, random

_

state

= 42)

# Get feature names

feature

_

names

= [

"

word

_

freq

_{

} "

for i in range

(1,

.

shape

[1] + 1)]

return X

_

train, X

_

test, y

_

train, y

_

test, feature

_

names

# TO

-

DO: import the data set into five variables: X

_

train, X

_

test, y

_

train, y

_

test, label

_

names

# Uncomment and edit the line below to complete this task.

test

_

split

= 0.1

# default test

_

split; change it if you'd like; ensure that this variable is used as an argument to your function

# your code here

_

train, X

_

test, y

_

train, y

_

test, label

_

names

=

get

_

spam

_

dataset

(

filepath

=

"data

/

spamdata

.

csv

",

test

_

split

= 0.1)

# X

_

train, X

_

test, y

_

train, y

_

test, label

_

names

=

.

arange

(5)

# Print the shapes of X

_

train and y

_

train

("

Shape of X

_

train:", X

_

train.shape

)

("

Shape of y

_

train:", y

_

train.shape

)

# Print label

_

names

("

Label names:", label

_

names

)

its returning wrong answer

,

can someone help.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Hi, I've talked to you earlier that you know the answers for the Enron case. Is it possible to get this done by afternoon of Wednesday at best. If not I'm also willing to take this on a Friday....

I'm willing to take this before the 19th of may. PROFESSIONAL ISSUES When a company looks too good to be true, it usually is. ; i - I The Rise and Fall of Enron BY C. WILLIAM THOMAS f you''re like...

"Fortran, Algol and Lisp invented most programming language concepts 50 years ago; adding the concept of object-orientation suffices to explain all programming languages to date". To what extent is...

Problem 2: Decision Tree, post-pruning and cost complexity parameter using sklearn 0.22 [10 points, Peer Review] We will use a pre-processed natural language dataset in the CSV file "spamdata.csv" to...

References: Health Information Management Case Studies Second Edition AHIMA by Dianna M. Foley Health Information Management Technology an Applied Approah Sixth Edition AHIMA by Nanette Sayles and...

Problem 2: Decision Tree, post-pruning and cost complexity parameter using sklearn 0.22 We will use a pre-processed natural language dataset in the CSV file "spamdata.csv" to classify emails as spam...

Problem 2 : Decision Tree, post - pruning and cost complexity parameter using sklearn 0 . 2 2 We will use a pre - processed natural language dataset in the CSV file "spamdata.csv " to classify emails...

Briefly describe ASCII and Unicode and draw attention to any relationship between them. [3 marks] (b) Briefly explain what a Reader is in the context of reading characters from data. [3 marks] A...

Why this comment still shows? One or more test cases in this cell did not pass.Instructor hints: 1. "For Problem 2, Part A, look at the shape of X_train."2. "For Problem 2, Part A, look at...

Chapter 7 Revising and Presenting Your Writing I'm not a very good writer, but I'm an excellent rewriter. James A. Michener Half my life is an act of revision. John Irving Getting Started INT RODU CT...

ETHICS IN PRACTICE CASE: Double Irish with a Dutch Sandwich 1. Who are the stakeholders and how are they affected by these corporate tax-saving strategies? 2. Do companies have a responsibility to...

Complete the sentences below, filling in A-G from the following list. A. Real numbers are all zero. B. For any x, if x > 0, then x > 0. C. There is some x > 0 such that x 0 D. For any x, if x 20 then...

The director of a training program for a large insurance company has the business objective of determining which method is best for training underwriters. The three methods to be evaluated are...

CT Corp Comprehensive Question Canadian Tire Corporation, Limited ( Canadian Tire ) is a family of companies that includes a retail segment and a financial services division, among others. The retail...

To whom might an entrepreneur present his or her business plan? Why? (Objective 4)

Assume you want to start a business in the community in which your school is located. What resources are available to youlocally or through the Internetas you approach the task of drafting a business...

A student organization to which you belong recently returned from its national conference. One of the highlights of the conference was announcement of Chapter of the Yearand your group won! Prepare...