Question: Consider the Default dataset available in ISLR library. The Default dataset provides information on 10,000 customers. Furthermore, the dataset contains four variables as follows: Table
- Consider the Default dataset available in ISLR library. The Default dataset provides information on 10,000 customers. Furthermore, the dataset contains four variables as follows:
Table 1: Variable Description
| Variables | Description |
| Default | A factor with levels No and Yes indicating whether the customer defaulted on their debt. |
| Student | A factor with levels No and Yes indicating whether the customer is a student. |
| Balance | The average balance that the customer has remaining on their credit card after making their monthly payment. |
| Income | Income of customer. |
The objective is to predict whether an individual will default on his/her credit card payment. The data set is divided in two partsa training set consisting of 5000 observations and a test set consisting of the remaining 5000 observations. Following results are available:
Table 2: Logistic Regression Model (Model 1) for predicting default using student
based on the Training Data Set
|
| Estimate | Standard Error | value | -value |
| Intercept | 3.482 | 0.099 | 35.312 | 0.000 |
| studentYes | 0.307 | 0.166 | 1.845 | 0.065 |
Note that studentYes is a dummy variable which takes the value 1 if the individual is a student and 0 otherwise. Furthermore, another logistic regression model using the variables student and balance is fitted based on the training data set. The details are provided below:
Table 3: Logistic Regression Model (Model 2) for predicting default using balance and student based on the Training Data Set
|
| Estimate | Standard Error | value | -value |
| Intercept | 11.020 | 0.7130 | 15.446 | 0.000 |
| balance | 0.006 | 0.0003 | 17.383 | 0.000 |
| studentYes | 0.822 | 0.3401 | 2.416 | 0.016 |
Based on the fitted logistic regression model (Model 2), a confusion matrix is obtained for the test dataset. The confusion matrix is shown below.
Table 4: Confusion Matrix Based on Test Data Set for Logistic Regression
| True Default Status | |||
|
Predicted Default Status |
| No | Yes |
| No | 4808 | 120 | |
| Yes | 23 | 49 | |
Based on the above output, answer the following questions (no need to fit any of the models):
- Write the equation of the fitted Model 1 (summary provided in Table 2). Calculate the predicted default probabilities for an individual who is a student and for an individual who is not a student. Who is riskier for the credit card company? [3]
- Consider the fitted Model 2 (summary provided in Table 3). Interpret the coefficients. Discuss the difference between Models 1 and 2. [4]
- Suppose the credit card company wants to provide a credit card only to those customers who have the predicted default probability below 0.10. Recently, a student has approached the credit card company. Based on the fitted Model 2 (summary provided in Table 3), calculate the maximum allowed balance for such an individual. [3]
- Based on the confusion matrix shown above (in Table 4), compute the sensitivity, specificity and total error rate for the logistic regression model. [3]
- Discuss how the performance of the logistic regression model can be improved.
[3]
- The logistic regression model uses here only a limited number of predictors. Identify at least 5 more predictors that can be useful in this context. [3]
- How will you use such a logistic regression model for decision-making in this context?
[3]
What are the possible downsides of such a model? Discuss in detail.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
