Question: ( 1 6 marks ) In this question, you will predict Orange Juice sales with the OJ dataset. Each line in this dataset is a
marks In this question, you will predict Orange Juice sales with the OJ dataset. Each line in this dataset is a purchase of either a Minute Maid MM or Citrus Hill CH brand orange juice collected in five different stores over some period of time. We will now fit various treebased classification algorithms to predict Purchase ie CH or MM from the remaining columns.
a Do some exploratory data analysis to get a feeling for the dataset. Answer the following questions:
Which brand tends to be more expensive?
Is one brand bought more often than the other one?
b Randomly split the dataset into a training and a test dataset with and relative size. In all following tasks, train on the training set. We will use the test set in the last step only.
c Using the tree package, fit a single classification tree and use fold crossvalidation
to prune it to optimal size. Visualise the tree and record its misclassification error on the training set. Since trees allow relatively easy interpretation, give a oneortwosentence insight in what can be learned from the tree about the structure of the data.
d Using the randomForest package, use bagging of trees to predict Purchase.
Record the misclassification rate obtained.
e Same as the previous task, but use a random forest instead of bagging.
f Same as the previous task, but use boosting via the gbm package. You will need
to specify the option distribution "bernoulli" because we are considering a binary classification problem. You will also need to encode Purchase as a variable in order to be able to apply gbm You can do this by creating a new feature Purchase and removing the original feature Purchase afterwards.
g Compare all methods so far by predicting them on the test set, and computing the misclassification rate. Which method performs best?
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
