Question: ( 1 6 marks ) In this question, you will predict Orange Juice sales with the OJ dataset ? 1 . Each line in this
marks In this question, you will predict Orange Juice sales with the OJ dataset
Each line in this dataset is a purchase of either a Minute Maid MM or Citrus Hill CH
brand orange juice collected in five different stores over some period of time We will
now fit various treebased classification algorithms to predict Purchase ie CH or MM
from the remaining columns.
a Do some exploratory data analysis to get a feeling for the dataset. Answer the
following questions:
Which brand tends to be more expensive?
Is one brand bought more often than the other one?
b Randomly split the dataset into a training and a test dataset with and
relative size. In all following tasks, train on the training set. We will use the test
set in the last step only.
c Using the tree package, fit a single classification tree and use fold crossvalidation
to prune it to optimal size. Visualise the tree and record its misclassification error
on the training set. Since trees allow relatively easy interpretation, give a oneor
twosentence insight in what can be learned from the tree about the structure of
the data.
d Using the randomForest package, use bagging of trees to predict Purchase.
Record the misclassification rate obtained.
e Same as the previous task, but use a random forest instead of bagging.
f Same as the previous task, but use boosting via the gbm package. You will need
to specify the option distribution "bernoulli" because we are considering
a binary classification problem. You will also need to encode Purchase as a
variable in order to be able to apply gbm You can do this by creating a new
feature Purchase and removing the original feature Purchase afterwards.
g Compare all methods so far by predicting them on the test set, and computing the
misclassification rate. Which method performs best?
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
