Question: **PLEASE PROVIDE ANSWERS TO 8,9,10,11 ** --- output: pdf_document: default html_document: default --- --- title: 'Home Equity Loan Customer Pre-screen and Scoring' subtitle: 'UMaine BUA684
**PLEASE PROVIDE ANSWERS TO 8,9,10,11**
--- output: pdf_document: default html_document: default ---
--- title: 'Home Equity Loan Customer Pre-screen and Scoring' subtitle: 'UMaine BUA684 Module 3' author: - FirstName LastName date: "`r format(Sys.time(), '%d %B %Y')`" output: pdf_document ---
*Note: you can always ask your chatGPT programming assistant to explain each of the following R codes.*
```{r message=FALSE, warning=FALSE} # install.packages("Rcpp") ```
# Problem The aim of this assignment is to, through decision tree model and logistic regression model, complete two common practices in credit risk management which are *customer pre-screening* and *customer scoring*. The two tasks are expected to provide value information to support decision makings at the levels of operative management and middle management.
# Data ```{r message=FALSE, warning=FALSE} # import data install.packages("readr") library(readr) hmeq <- read_csv("hmeq.csv") head(hmeq) ```
In below, Please write all your answers in **Bold** font.
*Problem 1: compared with the `hmeq_profile` dataset used in the Module 1 assignment, which additional attribute is contained in `hmeq` dataset?*
**Your answer: ( The hmeq dataset contains an additional attribute named JOB compared to the hmeq_profile dataset used in Module 1. )**
To do a supervised learning for the purpose of credit risk management in banking industry, we **MUST** wrangle the original data in the following way ```{r message=FALSE, warning=FALSE} # import data install.packages("dplyr") library(dplyr) hmeq_rev <- hmeq%>% select(-JOB) head(hmeq_rev) ```
*Problem 2a: Explain what the above data wrangling code does.*
**Your answer: (The above data wrangling code selects all attributes in the hmeq dataset except the JOB attribute and stores the resulting dataset as hmeq_rev. )**
*Problem 2b (Optional): In the business perspective of a bank, why is this data wrangling absolutely necessary?*
**Your answer: (The JOB attribute is not useful for credit risk modeling as it is subjective and not a direct indicator of creditworthiness. Therefore, removing this attribute from the dataset is necessary to improve the predictive accuracy of the models built on this dataset. )**
```{r message=FALSE, warning=FALSE} install.packages(c("caret", "mlbench")) library(caret) library(mlbench) set.seed(2020) inTrain <- createDataPartition(y = hmeq_rev$BAD,# the target attribute is used as the stratifying factor p = 0.70, # The percentage of data records labled as the training data list=FALSE)
training <- hmeq_rev[as.vector(inTrain),] # select all records labeled as training data and create the test set test <- hmeq_rev[-as.vector(inTrain),] # use the rest data to create the training set
nrow(training) table(training$BAD)/nrow(training) nrow(test) table(test$BAD)/nrow(test) ```
*Problem 3: Explain what the above data wrangling code does. Why is this data wrangling necessary for a predicting analytical model?*
**Your answer: (The above code partitions the hmeq_rev dataset into training and test subsets using the createDataPartition() function from the caret package. The inTrain variable stores a logical vector indicating which rows belong to the training set. The resulting training and test subsets are stored in the training and test variables, respectively.)**
# Analysis ## Decision tree model (customer pre-screening) *Problem 3: In the following chunk, use the business case example for decision tree model as reference to build the "best" tree model to predict the target attribute `BAD_Label` using the `training` and `test` subsets. Show all your R code in the submission. Also include the visualization of the model you build.* ```{r message=FALSE, warning=FALSE} # install required packages install.packages(c("rpart", "rpart.plot"))
library(rpart) library(rpart.plot)
# build decision tree model dtree <- rpart(BAD ~ ., data = training, method = "class", minbucket = 50)
# plot decision tree rpart.plot(dtree)
```
*Problem 4: In the following chunk, calculate the importance scores of different predictor attributes. Show all your R code in the submission.* ```{r message=FALSE, warning=FALSE} # calculate variable importance scores varImp(dtree)
```
*Problem 5: Based on their importance scores, which are the top five important predictors for predicting `BAD`?*
**Your answer: (The top five important predictors for predicting BAD based on their importance scores are: CLAGE, CLNO, DEBTINC, VALUE, and LOAN.)**
## Logistic regression model (customer scoring) *Problem 6: In the following chunk, use the business case example for logistic model as reference to build a logistic regression model to predict `BAD` using the five predictors in your answer to Problem 5. Show all your R code in the submission including that for addressing the multicollinearity problem. If the problem appears, you need to update your model to resolve the problem.* ```{r message=FALSE, warning=FALSE} # Load necessary libraries library(caret) library(dplyr)
# Create a subset of data with only the top five predictors predictors <- c("CLAGE", "CLNO", "DEBTINC", "VALUE", "LOAN") data_subset <- data[, c(predictors, "BAD")]
# Check for multicollinearity using variance inflation factor (VIF) vif <- caret::vif(data_subset) vif
# If VIF > 10 for any predictor, remove the predictor with the highest VIF while (max(vif) > 10) { max_vif_idx <- which(vif == max(vif)) removed_predictor <- colnames(data_subset)[max_vif_idx] data_subset <- data_subset[, -max_vif_idx] cat(paste("Removed predictor due to multicollinearity: ", removed_predictor, " ")) vif <- caret::vif(data_subset) }
# Build logistic regression model model <- glm(BAD ~ ., data = data_subset, family = "binomial")
# Print model summary summary(model)
```
*Problem 7: Based on your final model results for Problem 6, interpret the meaning of the regression coefficient estimate for the most important predictor.*
**Your answer:(Based on the final model results obtained from Problem 6, the interpretation of the regression coefficient estimate for the most important predictor depends on the specific predictor chosen as the most important based on the importance scores. Let's assume for this answer that the most important predictor is CLAGE, which is the first predictor in the list of top five predictors provided in Problem 5.
The regression coefficient estimate for CLAGE represents the change in the log-odds of the BAD outcome variable for a one-unit increase in the CLAGE predictor, holding all other predictors constant. Specifically, if the regression coefficient estimate for CLAGE is positive, it means that as the CLAGE value increases, the log-odds of the BAD event occurring also increases, indicating a higher probability of a bad credit outcome. Conversely, if the regression coefficient estimate for CLAGE is negative, it means that as the CLAGE value increases, the log-odds of the BAD event occurring decreases, indicating a lower probability of a bad credit outcome. )**
*Problem 8: In the following chunk, estimate the possible range of regression coefficient estimate for the most important predictor at 95% confidence level.* ```{r message=FALSE, warning=FALSE}
```
*Problem 9: Based on the result for Problem 8, interpret the generalized meaning of the regression coefficient estimate for the most important predictor.*
**Your answer: ( )**
*Problem 10: In the following chunk, measure performance of the logistic regression model you build. Show all your R code in the submission* ```{r message=FALSE, warning=FALSE}
```
*Problem 11: According to the model performance measure in Problem 10, is this model a poor/average/good/strong model?*
**Your answer:( )**
# Discussion *Reflect on the ways in which the decision tree and logistic regression model results can provide for senior management in a bank when making well-informed decisions related to credit risk management. Although this discussion won't be elaborated here, you will have the opportunity to collaborate with your project team to explore this topic further and collectively develop effective strategies*
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
