Question: Fit a logistic regression model on the Caravan data set from the R package ISLR. This data set, also analyzed in Sec 4.6.6 of ISLR,

Fit a logistic regression model on the "Caravan" data set from the R package "ISLR". This data set, also analyzed in Sec 4.6.6 of ISLR, has 85 predictors and the response variable is "Purchase" that is equal to "Yes" or "No".

We use the first 1000 obs as the test data and the remaining as the training data. In the test data, there are 941 "No" and 59 "Yes". For each of the approaches below, report the number of mis-classified samples among the 941 "No" and the number of mis-classified samples among 59 "Yes", if we use 0.25 as the predicted probability cut-off. Also use the R package "pROC" to report the corresponding AUC. For the definition of AUC and ROC, read pp146-149 of ISLR.

Fit a logistic regression model using all 85 predictors, and obtain the predicted probabilities on the test data.

  • If we use 0.25 as the probability cut-off, we misclassify ________[a1] (an integer) samples among 941 "No" and misclassifty ________[b1] (an integer) samples among 59 "Yes".
  • The AUC for this classifier is ______[c1] (round to 3 digits after the decimal point).

Apply forward variable selection using AIC. Use the selected model to obtain the predicted probabilities on the test data.

  • We use a model with ______[d2] (a non-negative integer) non-intercept predictors.
  • If we use 0.25 as the probability cut-off, we misclassify ______ [a2] (an integer) samples among 941 "No" and misclassifty ______ [b2] (an integer) samples among 59 "Yes".
  • The AUC for this classifier is ______ [c2] (round to 3 digits after the decimal point).

Apply forward variable selection using BIC. Use the selected model to obtain the predicted probabilities on the test data.

  • We use a model with ______ [d3] (a non-negative integer) non-intercept predictors.
  • If we use 0.25 as the probability cut-off, we misclassify ______ [a3] (an integer) samples among 941 "No" and misclassifty ______ [b3] (an integer) samples among 59 "Yes".
  • The AUC for this classifier is ______ [c3] (round to 3 digits after the decimal point).

Use L1 penalty to select a subset of the predictors. Use the glmnet package and set lambda = 0.004, and use the default options such as standardize = TRUE, intercept=TRUE. Use the selected model to obtain the predicted probabilities on the test data.

  • We use a model with ______ [d4] (a non-negative integer) non-intercept predictors.
  • If we use 0.25 as the probability cut-off, we misclassify ______ [a4] (an integer) samples among 941 "No" and misclassifty ______ [b4] (an integer) samples among 59 "Yes".
  • The AUC for this classifier is ______ [c4] (round to 3 digits after the decimal point).

Result for:

a1:

b1:

c1

d2

a2

b2

c2

d3

a3

b3

c3

d4

a4

ba

ca

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!