Question: PLEASE PROVIDE ANSWERS TO 8,9,10,11 --- output: pdf_document: default html_document: default --- --- title: 'Home Equity Loan Customer Pre-screen and Scoring' subtitle: 'UMaine BUA684

**PLEASE PROVIDE ANSWERS TO 8,9,10,11**

--- output: pdf_document: default html_document: default ---

--- title: 'Home Equity Loan Customer Pre-screen and Scoring' subtitle: 'UMaine BUA684 Module 3' author: - FirstName LastName date: "`r format(Sys.time(), '%d %B %Y')`" output: pdf_document ---

*Note: you can always ask your chatGPT programming assistant to explain each of the following R codes.*

```{r message=FALSE, warning=FALSE} # install.packages("Rcpp") ```

# Problem The aim of this assignment is to, through decision tree model and logistic regression model, complete two common practices in credit risk management which are *customer pre-screening* and *customer scoring*. The two tasks are expected to provide value information to support decision makings at the levels of operative management and middle management.

# Data ```{r message=FALSE, warning=FALSE} # import data install.packages("readr") library(readr) hmeq <- read_csv("hmeq.csv") head(hmeq) ```

In below, Please write all your answers in **Bold** font.

*Problem 1: compared with the `hmeq_profile` dataset used in the Module 1 assignment, which additional attribute is contained in `hmeq` dataset?*

**Your answer: ( The hmeq dataset contains an additional attribute named JOB compared to the hmeq_profile dataset used in Module 1. )**

To do a supervised learning for the purpose of credit risk management in banking industry, we **MUST** wrangle the original data in the following way ```{r message=FALSE, warning=FALSE} # import data install.packages("dplyr") library(dplyr) hmeq_rev <- hmeq%>% select(-JOB) head(hmeq_rev) ```

*Problem 2a: Explain what the above data wrangling code does.*

**Your answer: (The above data wrangling code selects all attributes in the hmeq dataset except the JOB attribute and stores the resulting dataset as hmeq_rev. )**

*Problem 2b (Optional): In the business perspective of a bank, why is this data wrangling absolutely necessary?*

**Your answer: (The JOB attribute is not useful for credit risk modeling as it is subjective and not a direct indicator of creditworthiness. Therefore, removing this attribute from the dataset is necessary to improve the predictive accuracy of the models built on this dataset. )**

```{r message=FALSE, warning=FALSE} install.packages(c("caret", "mlbench")) library(caret) library(mlbench) set.seed(2020) inTrain <- createDataPartition(y = hmeq_rev$BAD,# the target attribute is used as the stratifying factor p = 0.70, # The percentage of data records labled as the training data list=FALSE)

training <- hmeq_rev[as.vector(inTrain),] # select all records labeled as training data and create the test set test <- hmeq_rev[-as.vector(inTrain),] # use the rest data to create the training set

nrow(training) table(training$BAD)/nrow(training) nrow(test) table(test$BAD)/nrow(test) ```

*Problem 3: Explain what the above data wrangling code does. Why is this data wrangling necessary for a predicting analytical model?*

**Your answer: (The above code partitions the hmeq_rev dataset into training and test subsets using the createDataPartition() function from the caret package. The inTrain variable stores a logical vector indicating which rows belong to the training set. The resulting training and test subsets are stored in the training and test variables, respectively.)**

# Analysis ## Decision tree model (customer pre-screening) *Problem 3: In the following chunk, use the business case example for decision tree model as reference to build the "best" tree model to predict the target attribute `BAD_Label` using the `training` and `test` subsets. Show all your R code in the submission. Also include the visualization of the model you build.* ```{r message=FALSE, warning=FALSE} # install required packages install.packages(c("rpart", "rpart.plot"))

library(rpart) library(rpart.plot)

# build decision tree model dtree <- rpart(BAD ~ ., data = training, method = "class", minbucket = 50)

# plot decision tree rpart.plot(dtree)

```

*Problem 4: In the following chunk, calculate the importance scores of different predictor attributes. Show all your R code in the submission.* ```{r message=FALSE, warning=FALSE} # calculate variable importance scores varImp(dtree)

```

*Problem 5: Based on their importance scores, which are the top five important predictors for predicting `BAD`?*

**Your answer: (The top five important predictors for predicting BAD based on their importance scores are: CLAGE, CLNO, DEBTINC, VALUE, and LOAN.)**

## Logistic regression model (customer scoring) *Problem 6: In the following chunk, use the business case example for logistic model as reference to build a logistic regression model to predict `BAD` using the five predictors in your answer to Problem 5. Show all your R code in the submission including that for addressing the multicollinearity problem. If the problem appears, you need to update your model to resolve the problem.* ```{r message=FALSE, warning=FALSE} # Load necessary libraries library(caret) library(dplyr)

# Create a subset of data with only the top five predictors predictors <- c("CLAGE", "CLNO", "DEBTINC", "VALUE", "LOAN") data_subset <- data[, c(predictors, "BAD")]

# Check for multicollinearity using variance inflation factor (VIF) vif <- caret::vif(data_subset) vif

# If VIF > 10 for any predictor, remove the predictor with the highest VIF while (max(vif) > 10) { max_vif_idx <- which(vif == max(vif)) removed_predictor <- colnames(data_subset)[max_vif_idx] data_subset <- data_subset[, -max_vif_idx] cat(paste("Removed predictor due to multicollinearity: ", removed_predictor, " ")) vif <- caret::vif(data_subset) }

# Build logistic regression model model <- glm(BAD ~ ., data = data_subset, family = "binomial")

# Print model summary summary(model)

```

*Problem 7: Based on your final model results for Problem 6, interpret the meaning of the regression coefficient estimate for the most important predictor.*

**Your answer:(Based on the final model results obtained from Problem 6, the interpretation of the regression coefficient estimate for the most important predictor depends on the specific predictor chosen as the most important based on the importance scores. Let's assume for this answer that the most important predictor is CLAGE, which is the first predictor in the list of top five predictors provided in Problem 5.

The regression coefficient estimate for CLAGE represents the change in the log-odds of the BAD outcome variable for a one-unit increase in the CLAGE predictor, holding all other predictors constant. Specifically, if the regression coefficient estimate for CLAGE is positive, it means that as the CLAGE value increases, the log-odds of the BAD event occurring also increases, indicating a higher probability of a bad credit outcome. Conversely, if the regression coefficient estimate for CLAGE is negative, it means that as the CLAGE value increases, the log-odds of the BAD event occurring decreases, indicating a lower probability of a bad credit outcome. )**

*Problem 8: In the following chunk, estimate the possible range of regression coefficient estimate for the most important predictor at 95% confidence level.* ```{r message=FALSE, warning=FALSE}

```

*Problem 9: Based on the result for Problem 8, interpret the generalized meaning of the regression coefficient estimate for the most important predictor.*

**Your answer: ( )**

*Problem 10: In the following chunk, measure performance of the logistic regression model you build. Show all your R code in the submission* ```{r message=FALSE, warning=FALSE}

```

*Problem 11: According to the model performance measure in Problem 10, is this model a poor/average/good/strong model?*

**Your answer:( )**

# Discussion *Reflect on the ways in which the decision tree and logistic regression model results can provide for senior management in a bank when making well-informed decisions related to credit risk management. Although this discussion won't be elaborated here, you will have the opportunity to collaborate with your project team to explore this topic further and collectively develop effective strategies*

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!

--- output: pdf_document: default html_document: default --- --- title: 'Universal Bank Personal Loan Acceptance' subtitle: ' BUA Assignment' author: - FirstName LastName date: "`r format(Sys.time(),...

This is the previous question I submitted. I need Answer to Problem 8 , 9 , 10 and 11 now please *Problem 8: In the following chunk, estimate the possible range of regression coefficient estimate for...

USING R --- output: html_document: default pdf_document: default --- --- title: 'Home Equity Loan Customer Profiling' subtitle: ' BUA684 Module 1' author: - FirstName LastName date: "`r...

--- output: html_document: default pdf_document: default --- --- title: 'Twitter Retweetability Analysis' subtitle: 'UMaine BUA684 Module 3' author: - FirstName LastName date: "`r format(Sys.time(),...

USING R --- output: html_document: default pdf_document: default --- --- title: 'Home Equity Loan Customer Profiling' subtitle: ' BUA684 Module 1' author: - FirstName LastName date: "`r...

I need help with this essay. Performing tax research to find correct answers to a given tax situation and composing memoranda summarizing these findings are important parts of tax practice. As...

Question 7 a) Determine the rand amount of the bond. Answer 2 b) Determine the bond's current yield. Answer 2 c) Is the bond selling at par, at a discount, or at a premium? Why? Answer 3 d) Compare...

Question A, C and F P 7 ASSIGNMENT 3 Please study modules 7, 8, 9, 10, 11 and 12 of your textbook and then do STUDY PHASE 3 revision questions in your study and solutions guide before you complete...

quick please 5 6 7 8 9 10 11 12 13 14 15 16 Hardy Company is a wholly owned subsidiary of Stephine Corporation, a US based Co. on January 1, 2021. Hardy operates in a foreign in Switzerland....

Question 13 Please refer to Question 8, 9, 10, 11 and 12 above. Assume you have been presented with these values and have to calculate the WACC for your firm. Thus, based on the values presented,...

When the allergy drug Seldane was clinically tested, 70 people experienced drowsiness and 711 did not (based on data from Merrell Dow Pharmaceuticals, Inc.). Use this sample to estimate the...

1.) If the standard has a mass of 100-grams, estimate the largest amount of mass that could be measured. 2.) Which modification would provide more precise results: a shorter arm with a heavier ball,...

Employers with defined benefit plans must use the same actuarial assumptions for men and women for funding purposes. True False

please answer only if you know programming language is c++ For this lab, your objective is to define the functions below. Define the void function SwapNodes () that takes three generic type Node...

Question: **PLEASE PROVIDE ANSWERS TO 8,9,10,11 ** --- output: pdf_document: default html_document: default --- --- title: 'Home Equity Loan Customer Pre-screen and Scoring' subtitle: 'UMaine BUA684

Step by Step Solution

Students Have Also Explored These Related General Management Questions!

Question: PLEASE PROVIDE ANSWERS TO 8,9,10,11 --- output: pdf_document: default html_document: default --- --- title: 'Home Equity Loan Customer Pre-screen and Scoring' subtitle: 'UMaine BUA684