Question: Your task for this question is to build a spam classier using the UCR email spma dataset https://archive.ics.uci.edu/ml/datasets/Spambase came from the postmaster and individuals who

Your task for this question is to build a spam classier using the UCR email spma dataset

https://archive.ics.uci.edu/ml/datasets/Spambase came from the postmaster and

individuals who had led spam. Please download the data from that website. The collec-

tion of non-spam e-mails came from led work and personal e-mails, and hence the word

'george' and the area code '650' are indicators of non-spam. These are useful when con-

structing a personalized spam lter. You are free to choose any package and any language

to choose for this homework.

One would either have to blind such non-spam indicators or get a very wide collection of

non-spam to generate a general purpose spam lter. Load the data. You will see there are

total of 4601 instances, and 57 features. Note: there may be some missing values, you can

just ll in zero.

(a) Build a classication tree model (also known as the CART model). In

our answer, you should report the tree models tted similar to what is shown in the

\Random forest" lecture, the tree plot.

1

(b) Also build a random forrest model. Recall that in random forest, the

decision tree is grown on a bootstrapped dataset, constructed by selecting p of

the input variables at random as candidates for splitting. Comment on what is a

rule-of-thumb to choose here.

(c) Now partition the data to use the rst 80% for training and the remaining

20% for testing. Your task is to compare and report the test error for your classi-

cation tree and random forest models on testing data, respectively. Plot the curve

of test (OOB) error versus the number of trees used in random forest, similar to our

lecture.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!