Question: Exploratory Data Analysis : This exercise relates to the household income and expense dataset available on Blackboard as Inc Exp Data.csv. The data was taken

Exploratory Data Analysis: This exercise relates to the household income and expense dataset available on Blackboard as Inc Exp Data.csv. The data was taken from Kaggle and has 7 variables related to the income and expense details of households The following table defines the variables in the data:

Variable Name

Description

Mthly HH Income

Monthly household income

Mthly HH Expense

Monthly household expenses

No of Fly Members

Number of family members

Emi or Rent Amt

Rent or mortgage installment amount

Annual HH Income

Annual household income

Highest Qualified Member

Academic qualification of highest qualified family member

No of Earning Members

Number of earning family members

Load the dataset into R and answer the following questions:

How many rows and columns are in the dataset?

Convert the variable Highest Qualified Member to a factor variable. Print the summary of dataset and explain the key points of the summary for Mthly HH Income and Highest Qualified Member.

Calculate the mean and standard deviation of all numeric columns.

Hint: Use dplyr package to filter only numeric columns using the is.numeric filter and then generate summary statistics.

Calculate disposable income of households as the difference between monthly income and expenses.

Plot a histogram of disposable income with 10 breaks.

Hint: Use the hist function and look at the help file for the breaks argument

Construct a boxplot for monthly household income against the highest qualified member in a house- hold. Your boxplots should be in the sequence illiterate, undergraduate, professional, graduate, post-graduate.

Hint: You may need to redefine the levels of the factor variable Highest Qualified Member. Use the levels argument in the factor command. Use the boxplot function. You should get 5 box plots in the same chart.

For families with no more than 4 family members, calculate average monthly household income by highest qualified member using dplyr. Then, create a bar chart using ggplot2 demonstrating the same information.

Hint: Use chaining for dplyr filter, group by and summarize and pass it to the ggplot function.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!