Question: 4. [Simulation: Variable Selection] In this problem we will conduct a simulation study to explore variable selection. The 'true' model we will assume for this
![4. [Simulation: Variable Selection] In this problem we will conduct a](https://dsd5zvtm8ll6.cloudfront.net/si.experts.images/questions/2024/11/672b061719285_854672b0616eced8.jpg)
4. [Simulation: Variable Selection] In this problem we will conduct a simulation study to explore variable selection. The 'true' model we will assume for this problem is "a ' yi=180+E 61-sz All-((0.02), fora: 1,...,'n. Important: Since this question involves simulation, your rst line of R code for this problem must set the random seed with this command: set .seed(123) where you replace 123 with your student number. (A) Conduct a simulation study as follows: (i) Randomly generate n = 100 observations of p = 50 independent stande normally- distributed covariates. That is, generate m1, . . . ,I5g, where 31:3- is of length 100. (ii) Randomly generate n = 100 outcomes, y, according to the true model above, with g. = 1 and or = 2. (iii) Regress the simulated outcomes g on the rst 5 covariates: i.e. t the following model "a . at = 30 + lmn + 525% + 533513 + 54331.4 + 359325 + Er. Er if\" ill-((0.02). for % =1.m.' (iv) Now test for any association between these covariates and the outcomei.e. test whether all coefcients (except the intercept) are equal to 0 at the 5% level. (v) Repeat steps (ii)(iv) 1000 times. Plot a histogram of the pvalues from (iv). In what proportion of datasets do we reject the null hypothesis? (B) Repeat simulation (A) replacing step (iii) with the following: Fit ve different models as follows: Regress 3,: on the rst 5 covariates; call this model 1. Regress y on the (Sm10m covariates; call this model 2. ...Regress y on the at25\"\" covariates; call this model 5. Compute the adjustedR2 for each of these models, and choose the model which ts best according to this criteria. Proceed to step (iv) using the model you select. (C) Summarize your results. Explain what we can learn from this. (D) What if instead of manually comparing models as in (B), we used automatic selection? Repeat simulation (A), this time replacing step (iii) with: Use forward selection to pick a model (based on AIC), considering all 50 covariates. Proceed to step (iv) using the model you select. (E) Explain your ndings. Why are the results so extreme
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
