Question: Solving in R, please. 1. (10 points) Let's use the Caravan dataset from the ISLR package. (a) Create a training set consisting of the first

Solving in R, please.Solving in R, please. 1. (10 points) Let's use the Caravan dataset

1. (10 points) Let's use the Caravan dataset from the ISLR package. (a) Create a training set consisting of the first 1000 observations, and a test set consisting of the remaining observations. (b) Fit a gradient boosting machine model to the training set with Purchase as the response and the other variables as predictors. Use 1000 trees and a shrinkage value of 0.01. Which predictors are the most important? (c) Use the model to predict the response on the test set. Predict that a person will make a purchase if the estimated probability of purchase is greater than the threshold 10%. Compute the test error rate. Form a confusion matrix. What fraction of the people predicted to make a purchase do in fact make one? (d) Repeat part (c) but change the predicting threshold from 10% to a set of values: all the estimated probabilities for the test set. Use the result to make an ROC curve. Which threshold is the best? (e) Fit a support vector CLASSIFIER to the training data using cost = 1. What are the training and test error rates? (f) Use the tune() function to select an optimal cost. Consider values in the range 0 to 10. (g) Compute the training and test error rates using the optimal value for cost. (h) Repeat parts (e) through (g) using a support vector machine with a radial kernel. Try different values of gamma. (i) Repeat parts (e) through (g) using a support vector machine with a polynomial kernel. Try different values of the degree. (j) Compare parts (e) through (i), which approach gives the best results? 1. (10 points) Let's use the Caravan dataset from the ISLR package. (a) Create a training set consisting of the first 1000 observations, and a test set consisting of the remaining observations. (b) Fit a gradient boosting machine model to the training set with Purchase as the response and the other variables as predictors. Use 1000 trees and a shrinkage value of 0.01. Which predictors are the most important? (c) Use the model to predict the response on the test set. Predict that a person will make a purchase if the estimated probability of purchase is greater than the threshold 10%. Compute the test error rate. Form a confusion matrix. What fraction of the people predicted to make a purchase do in fact make one? (d) Repeat part (c) but change the predicting threshold from 10% to a set of values: all the estimated probabilities for the test set. Use the result to make an ROC curve. Which threshold is the best? (e) Fit a support vector CLASSIFIER to the training data using cost = 1. What are the training and test error rates? (f) Use the tune() function to select an optimal cost. Consider values in the range 0 to 10. (g) Compute the training and test error rates using the optimal value for cost. (h) Repeat parts (e) through (g) using a support vector machine with a radial kernel. Try different values of gamma. (i) Repeat parts (e) through (g) using a support vector machine with a polynomial kernel. Try different values of the degree. (j) Compare parts (e) through (i), which approach gives the best results

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!