Question: Your final project will require you to leverage several of the techniques we have discussed throughout the duration of this course. The project scope will

Your final project will require you to leverage several of the techniques we have discussed throughout the duration of this course. The project scope will require that you use an existing dataset to predict the likelihood of loan default using a publicly available dataset on loan defaults. The original dataset comes from Lending Club (See Course Resources). The dataset contains all loans approved from 2007 to 2018. Please review the data dictionary for a detailed explanation of all of the variables included in the dataset.
The purpose of this project will be as follows:
1. Bring the data into Python
2. Conduct exploratory data analysis (EDA) on the dataset
1. These include tasks such as the following:
1. Correlation analysis
2. Missing data
3. Relationship of features to target (charge-off)
4. Dimension reduction (if necessary)
3. Create a series of models in Python which should include logistic regression, decision tree, random forest, and gradient boosted machine (along with any others you feel would be appropriate/relevant)
4. The results of the predictive model you build should include a review of feature importance, predictive accuracy, lift/ROC charts, and confusion matrices
Upon completing the project, you will submit your Jupyter notebook with an appropriate write-up (in markdown format) along with results, recommendations, and the final model you would select going forward.
Several important points:
1. The target is not defined in the raw data file. You will need to create it from an existing feature.
2. There are features in the dataset that are included at the time the lending decision was made, as well as those that were tracked after the loan was made. It is crucially important that you do not include any of the future information in your predictive model. A sign of this (referred to as target leakage) is extremely accurate model results (i.e.,>90% accuracy).
3. Please make sure that you divide your data into a testing and validation set at a minimum. Please feel free to do additional types of validation (i.e., k-fold cross-validation).
Please also feel free to use other techniques used in the course to create new features to add as predictors (i.e., knn or cluster analysis, for example). Although this is unnecessary, please do not feel limited by only what is included.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!