Question: LAB EXHIBIT 2-3A Table Summary: Summary Attribute Description id Loan identification number member_id Membership ID loan_amnt Requested loan amount emp_length Employment length issue_d Date of
LAB EXHIBIT 2-3A
Table Summary: Summary
Attribute Description id Loan identification number member_id Membership ID loan_amnt Requested loan amount emp_length Employment length issue_d Date of loan issue loan_status Fully paid or charged off pymnt_plan Payment plan: yes or no purpose Loan purpose: e.g., wedding, medical, debt_consolidation, car zip_code The first three digits of the applicants zip code addr_state State dti Debt-to-income ratio delinq_2y Late payments within the past 2 years earliest_cr_line Oldest credit account inq_last_6mnths Credit inquiries in the past 6 months open_acc Number of open credit accounts revol_bal Total balance of all credit accounts revol_util Percentage of available credit in use total_acc Total number of credit accounts application_type Individual or joint applicationSource: LoanStatsXXXX.csv
-
Q2.Given this list of attributes, what types of questions do you think you could answer regarding approved loans? ( what concerns do you have with the datas ability to predict answers to the questions you identified earlier)? Take a moment and explore the data.
-
Q3.Is there anything in the data that you think will make analysis difficult? For example, are there any special symbols, nonstandard data, or numbers that look out of place?
-
Q4.What would you do to clean the data in this file?
Lets identify some issues with the data.
-
There are many attributes without any data, and that may not be necessary.
-
The [int_rate] values are written in ##.##%, but analysis will require #.####.
-
The [term] values include the word months, which should be removed for numerical analysis.
-
The [emp_length] values include n/a, <, +, year, and yearsall of which should be removed for numerical analysis.
-
Dates, including [issue_d], can be more useful if we expand them to show the day, month, and year as separate attributes. Dates cause issues in general because different systems use different date formats (e.g., 1/9/2009, Jan-2009, 9/1/2009 for European dates, etc.), so typically some conversion is necessary.
-
4. Q5.Why do you think it is useful to reformat and extract parts of the dates before you conduct your analysis? What do you think would happen if you didnt?
-
5. Q6.Did you run into any major issues when you attempted to clean the data? How did you resolve those?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
