Question: Your final project will require you to leverage several of the techniques we have discussed throughout the duration of this course. The project scope will
Your final project will require you to leverage several of the techniques we have discussed throughout the duration of this course. The project scope will require that you use an existing dataset to predict the likelihood of loan default using a publicly available dataset on loan defaults. The original dataset comes from Lending Club See Course Resources The dataset contains all loans approved from to Please review the data dictionary for a detailed explanation of all of the variables included in the dataset.
The purpose of this project will be as follows:
Bring the data into Python
Conduct exploratory data analysis EDA on the dataset
These include tasks such as the following:
Correlation analysis
Missing data
Relationship of features to target chargeoff
Dimension reduction if necessary
Create a series of models in Python which should include logistic regression, decision tree, random forest, and gradient boosted machine along with any others you feel would be appropriaterelevant
The results of the predictive model you build should include a review of feature importance, predictive accuracy, liftROC charts, and confusion matrices
Upon completing the project, you will submit your Jupyter notebook with an appropriate writeup in markdown format along with results, recommendations, and the final model you would select going forward.
Several important points:
The target is not defined in the raw data file. You will need to create it from an existing feature.
There are features in the dataset that are included at the time the lending decision was made, as well as those that were tracked after the loan was made. It is crucially important that you do not include any of the future information in your predictive model. A sign of this referred to as target leakage is extremely accurate model results ie accuracy
Please make sure that you divide your data into a testing and validation set at a minimum. Please feel free to do additional types of validation ie kfold crossvalidation
Please also feel free to use other techniques used in the course to create new features to add as predictors ie knn or cluster analysis, for example Although this is unnecessary, please do not feel limited by only what is included.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
