Question: Spam Email: Using the email dataset from the openintro package, you are going to write a function that calculates the sampling distribution for the mean

 Spam Email: Using the email dataset from the openintro package, you

Spam Email: Using the email dataset from the openintro package, you are going to write a function that calculates the sampling distribution for the mean of the number of line breaks in the email using the variable line_breaks. We will treat the email data as a complete census, so you will be subsampling from the population using the subsample. R function. On the Assignments page of the website, download the file subsample.R and move the file into a folder in your project named R. The function will take the values of the data.frame, the number of samples n, and the number of replicates of the experiment B. The function will return the sample mean for each of the B samples. For example, a single sample from the email data.frame is 12 library (openintro) data("email") source (here("R", "subsample.R")) ## sample size n 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ## $ to_multiple 1, 0, 0, 0, 0, 0, 1, 0, 1, 0 ## $ from 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ## $ cc 0, 1, 0, 4, 0, 0, 0, 2, 0, 1 ## $ sent_email 1, 0, 0, 0, 0, 0, 0, 1, 0, 1 ## $ time 2012-01-22 14:04:42, 2012-03-23 14:23:55, 2012-03-20 ... ## $ image 0, 0, 0, 0, 0, 0, 0, 0, 1, 0 ## $ attach 0, 0, 0, 2, 0, 0, 0, 0, 2,0 ## $ dollar 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ## $ winner no, no, no, no, no, no, no, no, no, no ## $ inherit 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ## $ viagra 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ## $ password 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ## $ num_char 5.106, 5.355, 0.765, 0.341, 24.317, 38.071, 0.559, 4.9... ## $ line_breaks 198, 141, 16, 18, 620, 727, 15, 108, 141, 46 ## $ format 1, 1, 0, 0, 1, 1, 0, 1, 1, 0 ## $ re_subj 0, 1, 0, 0, 0, 0, 0, 1, 0, 1 ## $ exclaim_subj 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ## $ urgent_subj 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ## $ exclaim_mess 1, 4, 0, 0, 1, 29, 0, 0, 1, 1 ## $ number big, big, small, none, small, small, small, big, small... a) Write a function that returns the sample mean for B samples of size n. Hint: write a for loop first then put the loop in a function. The function inputs should be a data.frame (e.g. email), the number of replicates B, and the sample size n. b) Using your function, create three datasets with B = 10000 replicates of size n = 10, n = 50, and n=200. For each of the three sample sizes, create a histogram of the sample means. c) Describe what you see. What are the shapes of the histograms? Are there any trends in the shape as n increases

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

Sure Lets go through the steps to achieve this task usi... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!