Question: Case Study: German Credit Risk Analysis: Context: To minimize loss from the bank s perspective, the bank needs a decision rule regarding whom to approve

Case Study:German Credit Risk Analysis:
Context:
To minimize loss from the banks perspective, the bank needs a decision rule regarding whom to approve the loan and whom not to. An applicants demographic and socio-economic profiles are considered by loan managers before a decision is taken regarding his/her loan application.
In this dataset, each entry represents a person who takes credit from a bank. Each person is classified as a good or bad credit risk according to the set of attributes.
Objective:
The objective is to buildpredictive models on this data to help the bank take a decision on whether to approve a loan to a prospective applicant.
Considerations:
If a potential customer is misclassified as being at Risk, the bank will not give the loan to that person. This will be a loss of opportunity to earn interest on the potential loan.
If a potential customer is misclassified as NOT being at Risk, they may be given the loan but may default later. This will be a loss of resources.
Based on the above logic, you need to decide whether to look for maximizing Recall, or Precision, or f1-score. Please give your reasons for choosing a particular metric and use the metric chosen by you to evaluate the performance of the models.
Attribute Information:
The data contains characteristics of the people
Age (Numeric: Age in years) Sex (Categories: male, female) Job (Categories : unskilled and non-resident, unskilled and resident, skilled, highly skilled) Housing (Categories: own, rent, or free) Saving accounts (Categories: little, moderate, quite rich, rich) Checking account (Categories: little, moderate, rich) Credit amount (Numeric: Amount of credit in DM - Deutsche Mark) Duration (Numeric: Duration for which the credit is given in months) Purpose (Categories: car, furniture/equipment, radio/TV, domestic appliances, repairs, education, business, vacation/others) Risk (0- Person is not at risk, 1- Person is at risk(defaulter))
The data set German_Credit.csv can be downloaded from Data sets folder in CANVAS
Tasks and rubric:
1. Explore: 3 points
Examine the data set and carry out EDA (particularly showing how the other variables may be related to the target variable Risk (through barplot/lineplot/boxplot etc.), to derive initial insights
2. Data preparation: 2 points
Check for missing values, convert string (object) variables to category. Separate the predictor and target variable. Create dummy variables as needed (the final data set should have all variables as numeric)
Note: If a variable is binary, dummy variables are not needed. However, if the two values of a binary variable e.g. gender is coded as Male and Female, these should either be converted to 0 and 1, or dummy variables made. For categorical variables with more than two categories, dummy variables MUST be made.
3. Model building: 5 points
Split the data in train and test sets, using a 75:25 split. Build a Decision Tree model, and a Random Forest model. Compare the performance on metrics: F1score, Precision and Recall.
4. Tuning and evaluation: 5 points
Improve the performance of these models by tuning the hyperparameters (use GridSearchCV). You can also try to use different class weights.
Compare the performance of all four models on training and test data set. Based on your criteria, choose the best model
5. Insights: 5 points
Determine the feature importance in your chosen model
List out the business insights, based on your EDA and chosen model
Caution: Tuning the Random Forest is computationally intensive. Therefore, do not specify a very large Hyperparameter space in GridSeachCV
Guidelines for submitting:
Annotate your Jupyter Notebook, to explain your procedures, comments and conclusions
After completion, run the Jupyter notebook from start to finish

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!