Question: = = = - (XTX) Tiyi - Y), Q3. Consider the multiple linear regression model given by Y = X8+ where Y nx 1 vector

 = = = - (XTX) Tiyi - Y), Q3. Consider themultiple linear regression model given by Y = X8+ where Y nx

= = = - (XTX) Tiyi - Y), Q3. Consider the multiple linear regression model given by Y = X8+ where Y nx 1 vector of the dependent variables, X nx (p+1) design matrix of full rank, B (p+1) x1 vector of regression coefficients, nx 1 vector of random errors satisfying e~ N(0,1,02), and In is the n x n identity matrix. Given the vector y of the n observations, the least squares estimator of B is given by @= (x+x)-'X'y leading to the fitted model = X. Now consider removing observation i from the data. Let X() be the (n 1) (p+1) matrix X with row i deleted. Let Y() be the (n 1) x 1 vector y with observation yi deleted. Let Bo be the estimate of B with observation i deleted, and let x] be the th row of X. Thus, X7 X(i) = XTX 2;2] is of order (p+1)x(p+1), and X7,4(1) = X"y = Xty-Riyi is of order (p+1) x 1. It can be shown that you do not need to prove this) S - 1-hi where hii is the ith diagonal element of H = X(XTX)-1XT. Let SSE =y"{In - X(XTX)-'XT}y denote the residual sum of squares based on all n data points. Further, let SSE(8) = 47,{In-1 X(1)(X7X())-X)}() denote the residual sum of squares when the ith data point is deleted. (b) In a study of the effects of cystic fibrosis, data were collected from 25 patients on variables related to body size and lung function. = = = maximal static expiratory pressure (cm H20), a measure of malnutrition; X body mass (weight/height?) as a percentage of the age-specific median in normal individuals; X2 weight (kg); X3 residual volume; X4 forced expiratory volume in one second. Using the statistical computing package R, a multiple linear regression model con- taining only the four variables, X1, X2, X3, X4, has been fitted to the cystic fibrosis data and has been followed by an analysis of the effect of deleting the ith observation in turn from the data for i = 1,..., 25. The results are presented on the next page. Here, "hat" and "resid" contain the values hi from the matrix H, and the residuals, ei, (i = 1,..., n), from the multiple linear regression model based on all n = 25 observations. Also, sigmahat" is the residual standard error for the model with the ith obser- vation deleted. The remaining five columns contain the change in B; effected by deleting the ith observation for a model containing X1, X2, X3, X4. At the bottom of the table are Bi, (j = 0,1,. 4), together with their respective standard errors, and finally, the residual standard error, "sigmahat" for the model based on all n observations. Comment on the hat values and the residuals for the individual observations. Which observations have most influence, that is, effect most change, on the residual standard error and the Bi, (j = 0,1,...,4), and why? [8 marks] (c) For any ONE row in the table below, show numerically how the "hat", "resid and "sigmahat" values are related to "sigma_hat for all data". [3 marks]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!