Question: In R, python, or your favorite programming language, write a function that takes as arguments a matrix X and a column vector y and returns
In R, python, or your favorite programming language, write a function that takes as arguments a matrix X and a column vector y and returns a list containing a column vector of OLS coefficient estimates, the standard estimate of the covariance matrix of the parameters and an estimate of the variance. Then use the file on canvas titled MonteCarloShell.R to run the following experiments (if you are using a language besides R you will need to write your own shell):
1. In your Monte Carlo shell in the section that computes the data for each trial create a column vector of standard normal random numbers of length N (the value of N to be specified later). Call it e. Then create an Nx2 matrix where the first column is all ones and the second column is made up of standard normal random variates. Next create a 2x1 matrix containing the values 1 and 2. Finally, compute the column vector y as the sum of the column vector equal to X times b and the column vector e or y = Xb + e.
a. Now set N to 3 and run 10,000 Monte Carlo trials. Recover each simulated estimate of the coefficient on the second column of the X matrix and each simulated estimate of the variance of the estimated coefficient. What is the mean of the simulated estimates of b[2]. Create a histogram of these results for the simulated coefficient estimates with 101 cells in the range 0 to 4. Compare the variance of the 10,000 estimates to the average value of the estimated variance across the 10,000 estimates. How do they compare? How accurate are the estimates of the coefficient? What fraction of the time are the estimates at least greater than zero?
b. Do the same as in a. with N = 5, 10, 50, 100, 1000 and 10,000. How does the graph change? How do the estimated variances compare to the actual variance of the estimates? How might you relate what you are seeing in the graphs to the concept of a probability limit (plim)?
c. Compute the 2 tail t-statistic for the hypothesis test b[2] = 2 for each of the 10,000 estimates of b[2] for the sample sizes of 5, 50 and 100. Find the .05 critical value for the t distribution for this model with each of the sample sizes. What fraction of the time do you reject the hypothesis that b[2] = 2. What would you expect to happen if you used a .01 critical value?
d. (extra credit) Use some other distribution of the errors besides the normal distribution and repeat c. for sample sizes 3, 10 and 50. How important is the assumption of normality of e for the t-test? Can you find a distribution of the errors that produces 20% more or fewer rejections than should happen with
normal errors for a sample size of 15?
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
