Question: 3. (16 points) In this question we will be understanding correlation between the features in the dataset credit risk dataset.csv. Load this dataset from shared/data/credit

3. (16 points) In this question we will be understanding correlation between the features in the dataset credit risk dataset.csv. Load this dataset from shared/data/credit risk dataset.csv. More information about the data can be found here: https://www.kaggle.com/datasets/ laotse/credit-risk-dataset/data (a) (2 points). Check whether there are any missing values i.e. NAs in the data. For this, explore dataframe.isna() function. i. Report the column names having NAs. ii. Drop all those rows which have NAs. (b) (2 points). Now we will be analyzing only a subset of dataframe. Create a subset of dataframe, containing only the columns person age, person income, loan amnt, loan percent income, cb person cred hist length (c) (4 points). Find correlation between the columns in the data using dataframe.corr(). Pick a pair of covariates and interpret their correlations. Which two predictors are the most highly correlated? The least? Does these correlations make sense in context? (d) (1 points) Using matplotlib.pyplot, plot a scatter plot that includes person income on X-axis and loan amnt on Y-axis. (e) (3 points) Study the plot from Q.3(d) i. Do you identify any outliers? ii. If yes, then suggest a transformation of the data that would reduce the influence of those outlier

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!

SUMMARY this journal, the length of it should not be more than 2 pages, with 1.5 spacing size 12 Times New Rome. Available online at www.sciencedirect.com Journal of Empirical Finance 15 (2008) 199 -...

Examine the pricing strategies in the gasoline market. Make sure to address the following topics: In the article, (Mixed) Strategy in Oligopoly Pricing: Evidence from Gasoline Price Cycles Before and...

Cengage 10 th Edition THE LEGAL ENVIRONMENT TODAY BUILDING SKILLS YOU WILL NEED TOMORROW Roger Leroy Miller Chapter 1 Unit 1. The Foundations 1. Law and Legal Reasoning Feverpitched/iStock/Getty...

Corporate Resilience and Response to COVID-19 by Alex Cheema-Fox, Bridget R. LaPerla, and Hul (Stacie) Wang, State Street Associates; and George Serafeim, Harvard Business School he coronavirus...

IfyouhaveplayedaSimulationcalledProBankerIneedhelpansweringthesequestionsassoonaspossible from the pro bankerassignment attachment..please use spreadsheet and players manual for reference. Need...

Please find attached assignment and see if you can help me in that and provide me the main points (in the article of spencer and web (2015) as attached) that need to be criticized, also I need an...

BACKGROUND An SM bond is a 35-year Australian government bond with a face value of $1. They are marketed to consumers saving for retirement (who buy the S part) and investors (who buy the M part)....

Hello, I have a case study need be answeredregarding pdf files questions, 1.Research the capacity of municipalities in Bosnia and Herzegovina for issuing a bonds? 2.What are pros and cons of bond...

Hello, I 1.Research the capacity of municipalities in Bosnia and Herzegovina for issuing a bonds? 2.What are pros and cons of bond financing? Compare long-term source of financing between Bosnia and...

A board of a local chain of auto repair shops sees that wait times for repairs is a major complaint among its customers. The board decides to set aggressive sales goals for its car mechanics and...

Explain deadlock detection and recovery.

Question 1 0 Evaluate the integral 0 1 x x 2 1 2 d x 2 5 ( x 2 1 ) 8 2 C 2 3 2 3 2 2 2 - 1 3 3 3 2 - 1 2

Food and seeds derived from genetically engineered plants and animals are not required to have a label that says it contains products from a GMO (genetically modified organisms). Should food...

Discuss whether you support or oppose the Supreme Court ruling on gene patents in the 2013 case: Association For Molecular Pathology Et Al Vs Myriad Genetics, Inc., et al. In short they ruled, that...

Research and summarize one data mining vendor. Additionally, you will identify data from the organization that might be a good target for data mining. Procedure chooses one open source or commercial...

1. Ammoniacal nitrogen can be determined by treatment of the sample with chloroplatinic acid; the product is slightly soluble ammonium chloride palatinate H2PtCl6 + 2NH4 + (NH4)2PtCl6 + 2H+ The...

Explain the INC, CMP, JNZ commands. Make short examples that you can use these commands together or separately.

Design a minimum-mass symmetric three-bar truss (the area of member 1 and that of member 3 are the same) to support a load P, as was shown in Fig. 2.9. The following notation may be used: Pu = P cos...

What are some of the factors causing food insecurity in much of Africa and in a range of impoverished nations around the world?

What can be done to get more people to value water more highly?

Why is catchment consciousness rated so highly by scholars like Carolyn Merchant?