Question: Case Study: German Credit Risk Analysis: Context: To minimize loss from the bank s perspective, the bank needs a decision rule regarding whom to approve
Case Study:German Credit Risk Analysis:
Context:
To minimize loss from the banks perspective, the bank needs a decision rule regarding whom to approve the loan and whom not to An applicants demographic and socioeconomic profiles are considered by loan managers before a decision is taken regarding hisher loan application.
In this dataset, each entry represents a person who takes credit from a bank. Each person is classified as a good or bad credit risk according to the set of attributes.
Objective:
The objective is to buildpredictive models on this data to help the bank take a decision on whether to approve a loan to a prospective applicant.
Considerations:
If a potential customer is misclassified as being at Risk, the bank will not give the loan to that person. This will be a loss of opportunity to earn interest on the potential loan.
If a potential customer is misclassified as NOT being at Risk, they may be given the loan but may default later. This will be a loss of resources.
Based on the above logic, you need to decide whether to look for maximizing Recall, or Precision, or fscore. Please give your reasons for choosing a particular metric and use the metric chosen by you to evaluate the performance of the models.
Attribute Information:
The data contains characteristics of the people
Age Numeric: Age in years Sex Categories: male, female Job Categories : unskilled and nonresident, unskilled and resident, skilled, highly skilled Housing Categories: own, rent, or free Saving accounts Categories: little, moderate, quite rich, rich Checking account Categories: little, moderate, rich Credit amount Numeric: Amount of credit in DM Deutsche Mark Duration Numeric: Duration for which the credit is given in months Purpose Categories: car, furnitureequipment radioTV domestic appliances, repairs, education, business, vacationothers Risk Person is not at risk, Person is at riskdefaulter
The data set GermanCredit.csv can be downloaded from Data sets folder in CANVAS
Tasks and rubric:
Explore: points
Examine the data set and carry out EDA particularly showing how the other variables may be related to the target variable Risk through barplotlineplotboxplot etc. to derive initial insights
Data preparation: points
Check for missing values, convert string object variables to category. Separate the predictor and target variable. Create dummy variables as needed the final data set should have all variables as numeric
Note: If a variable is binary, dummy variables are not needed. However, if the two values of a binary variable eg gender is coded as Male and Female these should either be converted to and or dummy variables made. For categorical variables with more than two categories, dummy variables MUST be made.
Model building: points
Split the data in train and test sets, using a : split. Build a Decision Tree model, and a Random Forest model. Compare the performance on metrics: Fscore, Precision and Recall.
Tuning and evaluation: points
Improve the performance of these models by tuning the hyperparameters use GridSearchCV You can also try to use different class weights.
Compare the performance of all four models on training and test data set. Based on your criteria, choose the best model
Insights: points
Determine the feature importance in your chosen model
List out the business insights, based on your EDA and chosen model
Caution: Tuning the Random Forest is computationally intensive. Therefore, do not specify a very large Hyperparameter space in GridSeachCV
Guidelines for submitting:
Annotate your Jupyter Notebook, to explain your procedures, comments and conclusions
After completion, run the Jupyter notebook from start to finish
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
