Question: Exploratory Data Analysis : This exercise relates to the household income and expense dataset available on Blackboard as Inc Exp Data.csv. The data was taken
Exploratory Data Analysis: This exercise relates to the household income and expense dataset available on Blackboard as Inc Exp Data.csv. The data was taken from Kaggle and has 7 variables related to the income and expense details of households The following table defines the variables in the data:
| Variable Name | Description |
| Mthly HH Income | Monthly household income |
| Mthly HH Expense | Monthly household expenses |
| No of Fly Members | Number of family members |
| Emi or Rent Amt | Rent or mortgage installment amount |
| Annual HH Income | Annual household income |
| Highest Qualified Member | Academic qualification of highest qualified family member |
| No of Earning Members | Number of earning family members |
Load the dataset into R and answer the following questions:
How many rows and columns are in the dataset?
Convert the variable Highest Qualified Member to a factor variable. Print the summary of dataset and explain the key points of the summary for Mthly HH Income and Highest Qualified Member.
Calculate the mean and standard deviation of all numeric columns.
Hint: Use dplyr package to filter only numeric columns using the is.numeric filter and then generate summary statistics.
Calculate disposable income of households as the difference between monthly income and expenses.
Plot a histogram of disposable income with 10 breaks.
Hint: Use the hist function and look at the help file for the breaks argument
Construct a boxplot for monthly household income against the highest qualified member in a house- hold. Your boxplots should be in the sequence illiterate, undergraduate, professional, graduate, post-graduate.
Hint: You may need to redefine the levels of the factor variable Highest Qualified Member. Use the levels argument in the factor command. Use the boxplot function. You should get 5 box plots in the same chart.
For families with no more than 4 family members, calculate average monthly household income by highest qualified member using dplyr. Then, create a bar chart using ggplot2 demonstrating the same information.
Hint: Use chaining for dplyr filter, group by and summarize and pass it to the ggplot function.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
