Question: Homework outline This homework is designed to give you practice with calculating error bars (confidence intervals) with ddply and using ggplot2 graphics to produce insightful

Homework outline

This homework is designed to give you practice with calculating error bars (confidence intervals) with ddply and using ggplot2 graphics to produce insightful plots of the results.

library(plyr) library(dplyr) library(ggplot2) 

You will continue using theadultdata set that you first encountered on Homework 3. This data set is loaded below.

adult.data <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", header=FALSE, fill=FALSE, strip.white=T, col.names=c("age", "type_employer", "fnlwgt", "education", "education_num","marital", "occupation", "relationship", "race","sex", "capital_gain", "capital_loss", "hr_per_week","country", "income")) adult.data <- mutate(adult.data, high.income = as.numeric(income == ">50K")) 

Problem 3: Two-sample t-test error bars.

(a) [3 points] Usingddplyand 2-sample t-testing, construct a table that shows the difference in the proportion of men and women earning above 50K across different employer types. E.g., if 20% of men and 15% of women in a group earn about 50K, the difference in proportion is 0.2 - 0.15 = 0.05. Your table should use the 2-sample t-test to also calculate the lower and upper endpoints of a 95% confidence interval. (While a t-test isn't appropriate for binary data when the number of observations is small, we'll ignore this issue for now.) Your table should look something like:

 type_employer prop.diff lower upper 1 ? 0.07743971 0.0504165 0.1044629 2 Federal-gov 0.31059432 0.2532462 0.3679424 3 Local-gov 0.18361338 0.1461258 0.2211009 ... # Edit me 

(b) Your table will have some fields that have the value NaN for the error bar limits. Explain why this is happening.

Your answer goes here!

(c) Subset your summary table to include just those rows for which you have valid calculated values of the difference in high earning proportion and the upper and lower confidence intervals. You will find theis.nanfunction useful here.

# Edit me 

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!