Question: Case Study: German Credit Risk Analysis: Context: To minimize loss from the bank s perspective, the bank needs a decision rule regarding whom to approve

Case Study:

German Credit Risk Analysis:

Context:

To minimize loss from the bank

s perspective, the bank needs a decision rule regarding whom to approve the loan and whom not to

.

An applicant

s demographic and socio

-

economic profiles are considered by loan managers before a decision is taken regarding his

/

her loan application.

In this dataset, each entry represents a person who takes credit from a bank. Each person is classified as a good or bad credit risk according to the set of attributes.

Objective:

The objective is to build

predictive models on this data to help the bank take a decision on whether to approve a loan to a prospective applicant.

Considerations:

If a potential customer is misclassified as being at Risk, the bank will not give the loan to that person. This will be a loss of opportunity to earn interest on the potential loan.

If a potential customer is misclassified as NOT being at Risk, they may be given the loan but may default later. This will be a loss of resources.

Based on the above logic, you need to decide whether to look for maximizing Recall, or Precision, or f

1 -

score. Please give your reasons for choosing a particular metric and use the metric chosen by you to evaluate the performance of the models.

Attribute Information:

The data contains characteristics of the people

Age

(

Numeric: Age in years

)

Sex

(

Categories: male, female

)

Job

(

Categories : unskilled and non

-

resident,

unskilled and resident, skilled, highly skilled

)

Housing

(

Categories: own, rent, or free

)

Saving accounts

(

Categories: little, moderate, quite rich, rich

)

Checking account

(

Categories: little, moderate, rich

)

Credit amount

(

Numeric: Amount of credit in DM

-

Deutsche Mark

)

Duration

(

Numeric: Duration for which the credit is given in months

)

Purpose

(

Categories: car, furniture

/

equipment

,

radio

/

,

domestic appliances, repairs, education, business, vacation

/

others

)

Risk

(0 -

Person is not at risk,

1 -

Person is at risk

(

defaulter

))

The data set

German

_

Credit.csv

can be downloaded from Data sets folder in CANVAS

Tasks and rubric:

1 .

Explore:

3

points

Examine the data set and carry out EDA

(

particularly showing how the other variables may be related to the target variable Risk

(

through barplot

/

lineplot

/

boxplot etc.

),

to derive initial insights

2 .

Data preparation:

2

points

Check for missing values, convert string

(

object

)

variables to category. Separate the predictor and target variable. Create dummy variables as needed

(

the final data set should have all variables as numeric

)

Note: If a variable is binary, dummy variables are not needed. However, if the two values of a binary variable e

.

.

gender is coded as

Male

and

Female

,

these should either be converted to

0

and

1,

or dummy variables made. For categorical variables with more than two categories, dummy variables MUST be made.

3 .

Model building:

5

points

Split the data in train and test sets, using a

75

25

split. Build a Decision Tree model, and a Random Forest model. Compare the performance on metrics: F

1

score, Precision and Recall.

4 .

Tuning and evaluation:

5

points

Improve the performance of these models by tuning the hyperparameters

(

use GridSearchCV

) .

You can also try to use different class weights.

Compare the performance of all four models on training and test data set. Based on your criteria, choose the best model

5 .

Insights:

5

points

Determine the feature importance in your chosen model

List out the business insights, based on your EDA and chosen model

Caution: Tuning the Random Forest is computationally intensive. Therefore, do not specify a very large Hyperparameter space in GridSeachCV

Guidelines for submitting:

Annotate your Jupyter Notebook, to explain your procedures, comments and conclusions

After completion, run the Jupyter notebook from start to finish

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

To minimize loss from the bank's perspective, the bank needs a decision rule regarding whom to approve the loan and whom not to. An applicant's demographic and socio-economic profiles are considered...

Read the case study *Rudy Wong, Investment Advisor* and, respond to these questions: From an emotion perspective: How should Wong advise Bob Miller? Page 2 9B10N004 success to his disciplined...

Question: have you ever faced an ethical issue in person or on the job? An individual's decision must be taken into account in this matter. It may include decisions you have taken in relation to...

Executive Memorandum (EM) Guidelines Prepare an executive summary of the papers that is only one page long.The assignment should be typed in Word, font size 11 double spaced, with minimum margins of...

Hello, I need help with the questions from the paper: Bleak weather for sun-shine AG: A case study of impairment of assets! ISSUES IN ACCOUNTING EDUCATION Vol. 30, No. 2 2015 pp. 113-126 American...

Management accounting assignment: Write a critique with reference list. Please help me write something as much as you could. Analysing technology investmentsfrom NPV to Strategic Cost Management...

Research Paper: Topic: Why did the traditional financial risk approaches, methods, and tools fail in the financial market meltdown of 2008 - 2009? Discuss questions DQ #1: How has fair value...

PLEASE ONLY WRITE REPORT ON THESE 2 PARTS FOR 1 PAGES, THANK YOU. Definition of Requirements [Identify detail requirement of key stakeholders] Statement of Work [A Scope Statement, reflecting...

1. Compare and contrast ZF and TRW. Based on ZFs motives for making this acquisition, what specific challenges do you think might arise during the integration process? 2. What megatrends in the...

Describe how the maximal-flow problem is modeled as a transshipment problem.

In Sec. 11.10, providing new memory to the process heap was mentioned as one of the scenarios that require a supply of zeroed pages in order to satisfy security requirements. Give one or more other...

Arbitrage refers to the buying and selling that occurs to equalize the rates of return on assets that have substantially different characteristics. Question 4 options: TrueFalse

Find an equation for f^-1(x), the inverse function.f-(x) = 0 Find an equation for f - (x), the inverse function. (Type an expression for the inverse. Use integers or fractions for any numbers in the...