Question: PYTHON QUESTION: #since ~86% of values are No, we can assume missing values are also No. #fill the missing values of Self_Employed with 'No' #include
PYTHON QUESTION:
#since ~86% of values are No, we can assume missing values are also No. #fill the missing values of Self_Employed with 'No'
#include your plots for above here
#Below define a new column in your dataframe that uses numpy to calculate the log of every value in Loan_Amount
#now plot this new column of your dataframe in a histogram. (20 bins)
#define a dataframe column called Total_Income that is the sum of Applicant_Income and CoapplicantIncome
#take the log transform of your new column
#plot the histogram of the data (20 bins)
#As an example I have saved the selection we made in Lab 3 df2 = df.loc[(df["Gender"]=="Female") & (df["Education"]=="Not Graduate") & (df["Loan_Status"]=="Y"), ["Gender", "Education", "Loan_Status"]] df2
#notice that the dataframe only contains the columns Gender, Education, and Loan_Status #that is because when we indexed we included the list after the comma (", ["Gender", "Education", "Loan_Status"]")
#create a new selection of all Males who are Graduates, saved as a variable df3 #for this selection retain all data columns
#use locIndexer 4 more times below #Create 4 subsets of the main datasheet #2 selections must be based off 3 criteria (like we did in Lab 3) #2 selections must be based off 5 criteria
#In 2 of your selections save only the criteria columns (like we did with Gender, Education, and Loan_Status above) #the other 2 selections should save all data columns
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
