Question: R Programming You are a data scientist at Great Loan Company (GLC). GLC is a public company traded in the stock exchange and is one
R Programming
You are a data scientist at Great Loan Company (GLC). GLC is a public company traded in the stock exchange and is one of the best loan lenders in the country. You are asked to perform analysis on the loan data for all loans issued from 2007-2015. Your manager gives you a data set, loan.csv, which contains the following columns:
id: A unique LC assigned ID for the loan listing.
loan_amnt: The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value.
term: The number of payments on the loan. Values are in months and can be either 36 or 60.
int_rate: Interest Rate on the loan.
installment: The monthly payment owed by the borrower if the loan originates.
Grade: LC assigned loan grade.
emp_length: Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years.
home_ownership: The homeownership status provided by the borrower during registration. Our values are: RENT, OWN, MORTGAGE, OTHER.
annual_inc: the self-reported annual income provided by the borrower during registration.
verification_status: Indicates if income was verified by LC, not verified, or if the income source was verified
loan_status: Current status of the loan
How would you code the following steps
Read the dataset in loan.csv into R. Call the loaded data, loan. Make sure that you have the directory set to the correct location for the data.
Which variables (columns) are continuous/numerical variables? Which columns are factors (categorical variables)?
Calculate the minimum, maximum, mean, median, standard deviation, and three quartiles (25th, 50th and 75th percentiles) of loan_amnt.
Calculate the minimum, maximum, mean, median, standard deviation and three quartiles (25th, 50th and 75th percentiles) of int_rate.
Calculate the correlation coefficient of the two variables: int_rate and installment. Do they have a strong relationship?
Calculate the frequency table of the term? Whats the mode of term variable?
Calculate the proportion table of loan_status? Whats the mode of loan_status variable?
Calculate the cross table of term and loan_status. Then produce proportions by row and column respectively.
The data is stored in the data frame, loan. Please summarize all the variables using one command.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
