Question: For this question, we are going to create data, and then estimate models on this simulated data. This allows us to effectively know the population

For this question, we are going to create data, and then estimate models on this simulated data. This allows us to effectively know the population parameters that we are trying to estimate. Consequently, we can reason about how well our models are doing.

create_homoskedastic_data <- function(n = 100) { d <- data.frame(id = 1:n) %>% mutate( x1 = runif(n=n, min=0, max=10), x2 = rnorm(n=n, mean=10, sd=2), x3 = rnorm(n=n, mean=0, sd=2), y = .5 + 1*x1 + 0*x2 + .25*x32 + rnorm(n=n, mean=0, sd=1) ) return(d) }

d <- create_homoskedastic_data(n=100)

Produce a plot of the distribution of the outcome data. This could be a histogram, a boxplot, a density plot, or whatever you think best communicates the distribution of the data. What do you note about this distribution?

outcome_histogram <- d %>% ggplot() # fill in the rest of this chunk to plot # you will need aes layers (to map data into the plot) # and geom_* layers to draw the plot. You can delete these # comments if you like.

"Fill in here: What do you notice about this distribution?"

Are the assumptions of the large-sample model met so that you can use an OLS regression to produce consistent estimates? "Fill in here: Are the large-sample assumptions satisfied?"

Estimate four models, called model_1, model_2, model_3 and model_4 that have the following form:

Y = 0 + 1x1 + 0x2 + 3x3 + (1) Y = 0 + 1x1 + 2x2 + 3x3 + (2) Y = 0 + 1x1 + 2x2 + 3x23 + (3) Y = 0 + 1x1 + 2x2 + 3x3 + 4x23 + (4)

# If you want to read about specifying statistical models, you can read # here: https://cran.r-project.org/doc/manuals/R-intro.html#Formulae-for-statistical-models' # note, using the I() function is preferred over using poly() model_1 <- 'fill this in' model_2 <- 'fill this in' model_3 <- 'fill this in' model_4 <- 'fill this in'

calculate_msr <- function(model) { # This function takes a model, and uses the `resid` function # together with the definition of the msr to produce # the MEAN of the squared residuals msr <- mean(resid(model)2) return(msr) } model_1_msr <- 'fill this in' model_2_msr <- 'fill this in' model_3_msr <- 'fill this in' model_4_msr <- 'fill this in'

Consider, for a moment, only the first model. Is it possible to select coefficients in this model that would produce a lower mean squared residual? Why or why not?

Which of these models does the best job, in terms of mean squared residuals, at estimating the population coefficients?

Is there any evidence that the additional parameter that you have estimated in model_2 makes make this second model more fully represent the true population? Conduct an F-test with the null hypothesis that model_1 is the correct population model, and evaluate whether you should reject the null to instead conclude that model_2 is more appropriate.

## anova(model_2, model_1, test = 'F')

Explain why the p-values for the tests that you have conducted in parts (a) and (b) are the same. Are these tests merely different ways of asking the same question of a model?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!

1 2 3 4 7 8 9 12 13 14 15 16 17 18 19 20 21 22 23 24 28 29 30 31 38 40 41 44 47 48 49 50 51 62 63 64 66 67 68 69 70 71 73 74 76 77 78 79 80 81 82 85 86 87 88 89 90 91 92 93 94 95 99 100 101 104 105...

Create a well writen academic Executive Summary from the following article: Introduction The National Oceanic and Atmospheric Administration (NOAA) is focusing on the 5-year post-delisting monitoring...

Jones & Bartlett Learning, LLC. NOT FOR RESALE OR DISTRIBUTION CHAPTER Hot Spot Analysis 10 LEARNING OBJECTIVES C A R R Provide a working definition of a \"hot spot.\" , Be able to explain different...

Confirming Pages C H A P T E R 19 Analyzing Information and Writing Reports Chapter Outline Using Your Time Efficiently Analyzing Data and Information for Reports Identifying the Source of the Data...

2.7. Explicit Solutions to Dierential Equations 109 power (kw) 20 15 10 5 0 08:00 10:00 12:00 14:00 16:00 time 18:00 2.7 Explicit Solutions to Dierential Equations In the very rare case in which an...

SUMMARY this journal, the length of it should not be more than 2 pages, with 1.5 spacing size 12 Times New Rome. Available online at www.sciencedirect.com Journal of Empirical Finance 15 (2008) 199 -...

You own a small storefront retail business and are interested in determining the average amount of money a typical customer spends per visit to your store. You take a random sample over the course of...

CERTIFICATE IV IN FINANCE AND MORTGAGE BROKING - FN540820 Page 1 UNIT 9 MANAGE PERSONAL AND PROFESSIONAL DEVELOPMENT Unit Code: BSBPEF501 This unit describes the skills and knowledge required to...

In 1981, a major change was made to decentralize the management of the University of Southern California. Deans of schools and managers of administrative units were given the authority for most of...

I need a 10 page paper for my MIS class. Please do not copy and paste as my school is getting stricter on plagiarism. I have attached the assignment and the sample \fData Analytic Thinking 1 Data...

The man A has a weight of 175 lb and jumps from rest h = 8 ft onto a platform P that has a weight of 60 lb. The platform is mounted on a spring, which has a stiffness k = 200 lb/ft. If the...

The following is a list of costs that were incurred in the production and sale of lawn mowers: a. Premiums on insurance policy for factory buildings b. Tires for lawn mowers c. Filter for spray gun...

1.37 A researcher was interested in the effect of physical education on the mental alertness in school children. She assigned students of one class to attend a physical education session in the...

Finally, I bought jewelry for my sister. Select one: O a. a few O b.one O C. any O d. some

4. Similarity (representativeness).

2. In this chapter, the reader should reflect on the following concepts: Idealized system design, analogue thinking, contrast principle, consistency principle, reciprocity principle, scarcity...

2. A Soviet invasion of Poland would lead to the severing of diplomatic relations between the United States and the Soviet Union.