Question: Loan Data set Description: We will be working with the loans_df data frame in this project. This data set contains information on over 4,000 individuals
Loan Data set Description:
We will be working with the loans_df data frame in this project. This data set contains information on over 4,000 individuals who secured a personal loan in 2017 from a national bank. The description of this data and the variables contained in it are provided below. The objective of this project is to explore the factors that lead to loan default and develop a machine learning algorithm that will predict the likelihood of an applicant defaulting on their loan in the future. The loans_df data frame contains information on 3 and 5-year loans that were originated in 2017 by a national bank for customers residing in the Middle Atlantic and Northeast regions of the United States. The company is looking to see if it can determine the factors that lead to loan default and whether it can predict if a customer will eventually default on their loan. The bank has experienced record levels of customers defaulting on their loans in the past couple of years and this is leading to large financial losses. The goal is to become better at identifying customers at risk of defaulting on their loans to minimize financial losses. What are the factors that are associated with customers defaulting on their loans? Is it possible to predict whether a customer will default on their loan? If so, how accurate are the predictions? How many costly errors is the model expected to produce (customers classified as not defaulting, but eventually do)? Are there any actions or policies the bank can implement to reduce the risk of loan default? Specifically, the broad questions that the bank is trying to answer include: The data set contains a mixture of applicant financial information (income, debt ratios, etc..), and applicant behavior (number of open accounts, historical engagement with the bank's products, number of missed payments, etc. . . ) The response variable in this data is loan_default. This variable records whether an applicant eventually defaulted on their loan and indicates a financial loss to the bank.
2 Note: The response variable has been coded as a factor with 'yes' as the first level. This is the format that tidymodels expects for calculating model performance metrics. There is no need to recode this variable in your machine learning process. Variable Information Variable Definition Data Type loan_default Did the borrower default on their loan (yes/no) Factor loan_amount Loan amount Integer installment Monthly paymeny amount Numeric interest_rate Interest rate Numeric loan_purpose Purpose of the loan Factor application_type Loan application type (individual or joint) Factor term Loan term (three/five year) Factor homeownership Borrower(s) homeownership status Factor annual_income Annual income Numeric current_job_years Years employed at current job Numeric debt_to_income Debt-to-income ratio at application time Numeric total_credit_lines Total number of open credit lines Integer years_credit_history Years of credit history Numeric missed_payment_2_yr History of missed payments in the last 2 years (yes/no) Factor history_bankruptcy history_tax_liens History of bankruptcy (yes/no) History of tax liens (yes/no) Factor Factor
a summary of overall findings and recommendations to the executives at the bank. Think of this section as your closing remarks of a presentation, where summarize the key findings, model performance, and make recommendations to improve loan processes at the bank.
Executive summary must be written in a business tone, with minimal grammatical errors, and should include the following sections:
1. An introduction where explain the business problem and goals of data analysis
- What problem(s) is this company trying to solve? Why are they important to their future success? - What was the goal of your analysis? What questions were you trying to answer and why do they matter?
2. Highlights and key findings from your Exploratory Data Analysis section - What were the interesting findings from your analysis and **why are they important for the business**?
- This section is meant to **establish the need for your recommendations** in the following section
3. "best" classification model and an analysis of its performance - In this section you should talk about the expected error of your model on future data - To estimate future performance, you can use your model performance results on the **test data** - Should discuss at least one performance metric, such as an F1 or ROC AUC for the model. However, must explain the results in an **intuitive, non-technical manner**. Audience in this case are executives at a bank with limited knowledge of machine learning.
4. The recommendations to the company on how to reduce loan default rates - Each recommendation must be supported by data analysis results
- Must clearly explain why you are making each recommendation and which results from data analysis support this recommendation
- must also describe the potential business impact of your recommendation: - Why is this a good recommendation? - What benefits will the business achieve?
5. Conclusion
Wrap up the report with concluding remarks by summarizing the results and THE recommendations in two or three paragraphs.
6. Appendix/Appendices
Include all the code, tables, and plots in this section. (R CODE)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
