Question: risk modelling Question 5 Consider the iris dataset for this question. Note that the response of this dataset (Species) has three levels, i.e., setosa, versicolor,

risk modelling
Question 5 Consider the iris dataset for this question. Note that the response of this dataset (Species) has three levels, i.e., setosa, versicolor, and virginica. However, we will only consider twolevel response here. Thus, we will omit the rows with the response of setosa by running the codes below. Then, we split the dataset into training and validation sets with a ratio of 7:3. \begin{tabular}{|l|} \hline Rdat=iris[51:150, ] \#1-50 is setosa, 51-100 is versicolor, and 101-150 is virginica \\ Spec=rep(0, nrow(Rdat)) \\ Spec[Rdat\$Species=="virginica"] =1#0 is versicolor, 1 is virginica \\ Rdat=data.frame(Rdat, Spec) \\ Rdat=Rdat[, -5] \#remove the original column of Species \\ set.seed(1) \\ train=sample(1:nrow(Rdat), 0.7*nrow(Rdat)) \\ train_set=Rdat[train, ] \\ test_set=Rdat[-train, ] \end{tabular} Tut 7_Page 2 BAS2083 Risk Modelling (a) Develop a binary logistic regression model (Model I) with the training set and obtain the test error rate based on the validation set. (b) Note that not all the predictors are significant in Model I, drop the insignificant predictor(s) and obtain Model II. Then, estimate the test error rate. (c) Develop an unpruned tree (Model III) with the training set and estimate the test error rate. [Hint: we need to use the function as.factor( ) for all the classification trees since the response is now 0 and 1] (d) From Model III, prune the tree by the CV approach and only a pruned tree (Model IV). Hence, estimate the test error rate. (e) Use the training set, develop a bagging tree (Model V), and estimate the test error rate. (f) To this end, which model (Models I - V) appears to be the best? Do the results meet your expectation? Justify
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
