Question: fSTAT3503 A Assignment 5 Due Monday , Dec 5, 2016 In Class Table 1: Analysis of Variance Source DF Model Error Corrected Total 5 23

\fSTAT3503 A Assignment 5 Due Monday , Dec 5, 2016 In Class Table 1: Analysis of Variance Source DF Model Error Corrected Total 5 23 28 Sum of Squares 13196 1486.39502 14682 Mean Square 2639.10665 64.62587 F Value Pr>F 40.84 < .0001 Table 2: Output Statistics Dependent Predicted Std Error Student Variable Value Residual Residual 1 272 267.571 4.363 0.969 2 264 263.015 0.986 7.627 3 239 234.51 4.29 7.636 0.562 4 231 213.304 17.396 6.011 2.894 5 252 253.098 -1.498 7.294 -0.205 0.002 6 258 255.936 1.964 7.381 0.266 0.002 7 264 267.927 -4.027 7.097 -0.568 0.015 8 267 278.089 -11.589 7.08 -1.637 0.129 -1.703 0.224 -0.916 9 229 240.204 -11.104 7.328 -1.515 0.078 -1.562 0.169 -0.705 10 239 246.479 -7.179 7.531 -0.953 0.021 -0.951 0.122 -0.355 11 258 257.664 0.336 7.733 0.043 0 0.043 0.075 0.012 12 258 266.269 -8.669 7.778 -1.115 0.014 -1.121 0.064 -0.293 13 267 269.392 -2.092 7.671 -0.273 0.001 -0.267 0.089 -0.084 14 267 266.373 0.627 7.038 0.089 0 0.087 0.234 0.048 15 260 262.271 -2.671 7.612 -0.351 0.002 -0.344 0.103 -0.117 16 240 235.109 5.291 6.169 0.858 0.086 0.853 0.411 0.712 17 227 226.57 0.63 7.363 0.086 0 0.084 0.161 0.037 18 196 209.685 -13.685 7.18 -1.906 0.153 -2.031 0.202 -1.023 19 279 273.119 5.581 6.977 0.8 0.035 0.794 0.247 0.454 20 272 275.069 -2.769 7.295 -0.38 0.005 -0.372 0.176 -0.172 21 267 263.515 3.885 7.496 0.518 0.007 0.51 0.13 0.198 22 255 240.144 14.356 6.865 2.091 0.271 2.273 0.271 1.385 23 225 221.136 3.564 7.419 0.48 0.007 0.472 0.148 0.197 24 182 193.495 -11.995 6.665 -1.8 0.246 -1.899 0.313 -1.281 25 228 227.965 -0.465 6.487 -0.072 0 -0.07 0.349 -0.051 26 254 249.793 3.807 7.297 0.522 0.01 0.513 0.176 0.237 27 263 259.083 3.917 7.401 0.529 0.008 0.521 0.152 0.221 28 266 266.935 -1.135 7.515 -0.151 0.001 -0.148 0.126 -0.056 29 264 255.783 8.017 7.379 1.086 0.037 1.091 0.157 0.472 Obs c G. Li Residual Hat Diag H Cook's D RStudent DFFITS 0.375 0.968 0.706 1.498 0 0.126 0.1 0.042 0.006 0.553 0.098 0.182 3.55 0.441 3.153 0.177 -0.093 0.261 0.157 0.113 -0.559 0.221 Page 3 of 3 You are given the following output (Table 1 and 2) obtained by fitting a linear regression model to a set of data. Using = 0.1, answer the following questions: (1) [10 Marks] Please complete table 2. Residual = yi-yi(i) = 272-267.571 = 4.229 Student Residual = Rstudent = Residual 0.986 = =0.129 Standard Error ( Residual) 7.627 ei MSE ( i )(1hi ,i ) MSE(i)= [(n-p)MSE-(ei2/1-hi,i)]/n-p-1 MSE(5)= [24*64.62587-(-1.4982/0.823)]/23 MSE(i)= 67.317 Rstudent = 1.498 67.317( 0.823 ) =-0.201 DFFITS= ei MSE (i ) hi, i MSE(i)= [(n-p)MSE-(ei2/1-hi,i)]/n-p-1 MSE(7)= [24*64.62587-(-4.0272/0.779)]/23 =66.531 4.027 66.531 0.221 0.129 Cook's = 6.017/17.3962 = 0.019 (2) [2 Marks] Which cases are most likely to be outliers with respect to their X values among the cases shown in the table 2. Explain why. The largest leverage value is the first h value h1 = 0.706. It exceeds the criterion of twice the mean leverage value, 2p/n = 2(5) / 29 = .345. The next two largest leverage values are case 4 and case 16, but both of them are very smaller than the value of case 1. So we identify the case 1 as a outlying X observation. (3) [3 Marks] Which cases are most likely to be outliers with respect to their Y values among the cases shown in the table 2. Use studentized residual . Explain why. 2.894 is the largest value in the Student Residual column. This value exceeds > qt(0.95, 23) [1] 1.713872 And thus an outlier. Other cases that exceeds this value are case 18, 22 and 24. (4) [2 Marks] Identify any outlying Y observations among the cases shown in the table 2. Use the Bonferroni outlier test procedure with = 0.05. The largest value on the Rstudent column (Studentized deleted residuals) is 3.55 We test the case 4, which has the largest absolute sutdentized deleted residual (3.55) with = 0.05. We Use the R function qt (quantiles of Student t distribution) to calculate the quantile value with =0.05. The probability of Student t distribution and the degree of freedom are set to the arguments. > qt(0.95, 23) [1] 1.713872 Since 3.55> 1.713872, we conclude that it is an outlier Cases 8, 18, 22 and 24 are also outliers. (5) [5 Marks] Which cases are most likely to be influential cases among the cases shown in the table 2. Explain why. tables in the files Compare the DFFITS values to 2 p /n=2 5 29 = 0.8304 Absolute DFFITS values larger than 0.8304 have large influence on their own predictions. The values are case 4=3.153, 1, 8, 18, 22 and 24. The Same results can be obtained by comparing the Cook's value of the cases with the F distribution F(p, n-p) where p=5 and n=29. You are given the following output (Table 1 and 2) obtained by fitting a linear regression model to a set of data. Using = 0.1, answer the following questions: (1) [10 Marks] Please complete table 2. Residual = yi-yi(i) = 272-267.571 = 4.229 Student Residual = Rstudent = Residual 0.986 = =0.129 Standard Error ( Residual) 7.627 ei MSE ( i )(1hi ,i ) MSE(i)= [(n-p)MSE-(ei2/1-hi,i)]/n-p-1 MSE(5)= [24*64.62587-(-1.4982/0.823)]/23 MSE(i)= 67.317 Rstudent = 1.498 67.317( 0.823 ) =-0.201 DFFITS= ei MSE (i ) hi, i MSE(i)= [(n-p)MSE-(ei2/1-hi,i)]/n-p-1 MSE(7)= [24*64.62587-(-4.0272/0.779)]/23 =66.531 4.027 66.531 0.221 0.129 Cook's = 6.017/17.3962 = 0.019 (2) [2 Marks] Which cases are most likely to be outliers with respect to their X values among the cases shown in the table 2. Explain why. The largest leverage value is the first h value h1 = 0.706. It exceeds the criterion of twice the mean leverage value, 2p/n = 2(5) / 29 = .345. The next two largest leverage values are case 4 and case 16, but both of them are very smaller than the value of case 1. So we identify the case 1 as a outlying X observation. (3) [3 Marks] Which cases are most likely to be outliers with respect to their Y values among the cases shown in the table 2. Use studentized residual . Explain why. 2.894 is the largest value in the Student Residual column. This value exceeds > qt(0.95, 23) [1] 1.713872 And thus an outlier. Other cases that exceeds this value are case 18, 22 and 24. (4) [2 Marks] Identify any outlying Y observations among the cases shown in the table 2. Use the Bonferroni outlier test procedure with = 0.05. The largest value on the Rstudent column (Studentized deleted residuals) is 3.55 We test the case 4, which has the largest absolute sutdentized deleted residual (3.55) with = 0.05. We Use the R function qt (quantiles of Student t distribution) to calculate the quantile value with =0.05. The probability of Student t distribution and the degree of freedom are set to the arguments. > qt(0.95, 23) [1] 1.713872 Since 3.55> 1.713872, we conclude that it is an outlier Cases 8, 18, 22 and 24 are also outliers. (5) [5 Marks] Which cases are most likely to be influential cases among the cases shown in the table 2. Explain why. tables in the files Compare the DFFITS values to 2 p /n=2 5 29 = 0.8304 Absolute DFFITS values larger than 0.8304 have large influence on their own predictions. The values are case 4=3.153, 1, 8, 18, 22 and 24. The Same results can be obtained by comparing the Cook's value of the cases with the F distribution F(p, n-p) where p=5 and n=29

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!