Question: Open RStudio (or RStudio Cloud) to get started We will be using the diamonds dataset stored in the tidyverse package. So start by running library(tidyverse)

Open RStudio (or RStudio Cloud) to get started

We will be using the diamonds dataset stored in the tidyverse package. So start by running library(tidyverse)

Open the diamonds data by running the code: View(diamonds). Each row represents one diamond from a collection of over 59,000.

Take a look at the documentation for diamonds by running the code: ?diamonds

Question 1: Create a histogram of the price variable (For all histograms in this assignment, use the base R function hist). Also calculate the mean and standard deviation of this variable.

Question 2 : Take a random sample of 50 diamond prices from this dataset and name this vector fifty_diam (If saved properly, you will see this vector of length 50 saved in your global environment!). Sample without replacement (this will be the default option). Create a histogram of your sample, and then calculate the mean and standard deviation of this sample.

Include the image of your histogram in your report

Include the mean and standard deviation values

How much (absolute) error is there in your sample mean as an estimate of the true mean?

How much (absolute) error is there in your sample standard deviation (SD) as an estimate of the true SD?

Question 3: Next, set up a for loop to simulate taking a sample of size 50 at least 10,000 times. Inside your loop, calculate the mean price and save it to a vector called means. Here are two tips:

Remember before the loop to define means = NULL so that your loop knows where to save the means.

Remember inside the loop to include an index indicator with your means vector so that the vector fills iteratively for each iteration of the loop.

Try running the loop 10 times to ensure it works. This should be instantaneous. Then try running it 10,000 times. The loop should only take a few seconds to complete at 10,000 simulations, so if you wait more than a minute, click the stop button and see if something is defined incorrectly.

After successfully running your simulation, create a histogram of your means vector. Just use the hist() function rather than ggplot.

Include the image of your histogram in your report

Include the R code you used to generate this loop

Briefly describe the shape of your histogram. Is this a symmetric distribution or would you say it’s skewed? How does this relate to the Central Limit Theorem we learned in class?

Question 4: As you should notice from your histogram, our sample means will vary with each sample we take. Calculate the standard deviation of the means vector.

Report the standard deviation of the simulated means

We completed a finite number of simulations (10,000), but what is this value approximating? Report the name of this measure and calculate the true value for this measure too.

Question 5: Repeat question 3, but with a sample size of 10 instead of 50. Call your vector of sample means means_ten. After successfully running your simulation, create a histogram of your means_ten vector using the hist function again.

Include the image of your histogram in your report

Include the R code you used to generate this loop

Briefly describe the shape of your histogram. Is this a symmetric distribution or would you say it’s skewed? How does this relate to the Central Limit Theorem we learned in class?

Is the standard deviation of the simulated means higher or lower than it was for n = 50?

Question 6: We spent some time exploring the behavior of the sample mean, but now let’s look at the sample median! Redo question 3 with a sample size of 50, but now calculate the sample median inside your loop. Call your vector of sample medians medians_fifty. After successfully running your simulation, create a histogram of your medians_fifty vector using the hist function again.

Include the image of your histogram in your report

Include the R code you used to generate this loop

Briefly describe the shape of your histogram. Is this a symmetric distribution or would you say it’s skewed? Do you have any predictions for what would happen if we repeated this simulation again, but with a much larger sample size?

Calculate and report the standard deviation of the medians_fifty vector. This is the expected error in a randomly generated sample median as an estimate of the true median.

Question 7: Repeat question 6, but with a sample size of 500.

Include the image of your histogram in your report

Include the R code you used to generate this loop

Briefly describe the shape of your histogram. How has the shape changed in comparison to the distribution of sample medians when we took samples of size 50?

Calculate and report the standard deviation of your newest vector of medians. How does this expected error compare to when we had samples of size 50? Is this expected or surprising to you?

Step by Step Solution

★★★★★

3.46 Rating (156 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock

To answer this question thoroughly I will break it down ... View full answer

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!

Calculate the mean and standard deviation for the probability distribution of Example 5.

Calculate the mean and standard deviation of the binomial distribution using the formulas in mean = np sd = /np(1 - p) (a) Exercise 5.65 if n is changed to 20. (b) Exercise 5.70 when n = 20. (c)...

Calculate a mean and standard deviation for each variable.

1. Using a diagram of utility maximization, show: a. The income effect of price change b. The substitution effect of price change

Comment on the importance of the return on revenue financial objective.

The following example is from the book Introduction to Regression Modeling ( Abraham and Ledolter, 2006). The researchers were examining data on death penalty sentencing in Georgia. For each of 362...

=+c) What null hypothesis can we test with the t-ratio for R Rating?

On January 1, 2016, Hyde Corporation purchased bonds with a face value of $300,000 for $308,373.53. The bonds are due June 30, 2019, carry a 13% stated interest rate, and were purchased to yield 12%....

BuyCo, Inc. holds 20 percent of the outstanding shares of Marqueen company and appropriately applies the equity method of accounting. Excess cost amortization (related to a patent) associated with...

7) The second financial statement to prepare is the statement of retained eamings. To determine the ending balance of retained earnings you need to start with the opening balance as reported in the...

On November 1, 2020 the Jones Corporation acquired a new equipment to be used in its production process. Jones exchanged an existing asset that had an original cost of $100,000 and accumulated...

Since the proposed medication is injectable are you planning on premediating your patients to avoid any potential injection related reactions?

2 . As we move forward with the workshop, we will discuss how important it is to communicate to your audience ( as Florence Nightingale did ) . What does the above graph / chart communicate? What are...

Do ITACS students represent information security vulnerabilities to Temple University, each other, or both? Explain your answer.

Question 3: All employees of a. certain company have medical coverage, 30% have deluxe coverage, 60% have standard coverage and 10% have economy coverage. From recorded data, the probability that an...

Give a detailed answer,Why Is the Demand for Online Shopping Increasing? Explain !

The actor in a use case is generallya. The Project managerb. An external user of the systemc.The Championd.The Project Sponsore. The Systems Analyst

AB CORPORATION ISSUED THE FOLLOWING 850 COMMON STOCKS PAR VALUE P100 750 PARTICIPATING PREFERRED STOCKS PAR VALUE P100 AT 3% AB CORPORATION DECLARED P100,000.00 DIVIDEND IN 2022.

Colorado Climber, which manufactures stairway railings, purchased a $15,000 lathe on January 2, 2009. The lathe was estimated to have a salvage value of $1,000 at the end of its ve-year useful life....

Following are a series of statements regarding topics discussed in this chapter. Required: Indicate whether each statement is true (T) or false (F). (a) Financial statements are the principal means...

Over a recent three-year period, Vino Veritas statements of cash ows revealed cumulative increases in the companys accounts receivable of $145 million. Required: (a) How does an increase in accounts...

Suppose Staples Inc.'s $1.9 million cost of inventory at its fiscal year-end on January 31, 2015, was understated by $0.5 million. 1. Would 2015 reported gross profit of $5.2 million be overstated,...

Mattson Loan Company completed these transactions: 2016 Apr. 1 Loaned $20,000 to Charlene Baker on a one-year, 5% note. Dec. 31 Accrued interest revenue on the Baker note. 2017 Apr. 1 Collected the...

Assume Loblaw Companies Limited reported these figures in millions of dollars: __________________________________________________________________2017 ________2016 Net...