Question: Use the following code to generate a data set ```{r, include = TRUE} set.seed(20210301) n
Use the following code to generate a data set ```{r, include = TRUE} set.seed(20210301) n <- 200 z1 <- rnorm(n) z2 <- rnorm(n) mydat <- data.frame(X1 = z1, X2 = 0.1 * z1 + 0.3 * z2, Y = 1 + z1 + z2 + rnorm(n, sd = 2)) ``` The data set `mydat` is of sample size `r n` and has three variables: $Y$, $X_1$, and $X_2$.
(a) Divide the data into 2 subsets. The first subset consists of the first 120 observations in `mydat` and the second contains the remaining observations. Fit two separate multiple regressions of $Y$ on $X_1$ and $X_2$ using the two subsets. Report the estimated coefficients.
(b) Obtain predictions from the model fit to the first subset for the values of `Y` in the second subset. Compute and report the average squared residual. The square root of this quantity is an estimate of the average prediction error.
(c) Fit the multiple regressions of $Y$ on $X_1$ and $X_2$ using the entire data set `mydat`. $$ E(Y|X_1, X_2) = \beta_0 + \beta_1 X_1 + \beta_2 X_2. $$
Use R to answer this question.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
