Question: Stat 3500 Section 9 Exam 3 Instructor: Cheng Dong 1/8 . STAT 3500 Fall 2014 - Section 9 Introduction to Probability and Statistics II sh
Stat 3500 Section 9 Exam 3 Instructor: Cheng Dong 1/8 . STAT 3500 Fall 2014 - Section 9 Introduction to Probability and Statistics II sh is ar stu ed d vi y re aC s o ou urc rs e eH w er as o. co m Exam III (100 points) Instruction: - You have 75 minutes available. - Read through all problems carefully. - You may use your calculator and your formula packet only. - Partial credit may be awarded on full work problems, but work must be clearly provided in order to earn credit on any such problem. No work, no credit. Th Name: Student ID: Score (out of 100): https://www.coursehero.com/file/14028232/Coursehero-exam-3-solpdf/ KEY Stat 3500 Section 9 Exam 3 Question 1. Instructor: Cheng Dong 2/8 True or False Questions (20 points) For the following statements circle (T)rue or (F)alse. In order to fit a second-order model, we need at least two different observed X-values. T In multiple linear regression, it is always true that 0 Ra2 < R2 1 for the same model. T For a model with quantitative and qualitative variables, separate prediction equations for each level of the qualitative variable will be the same if the baseline level changes. F In multiple linear regression, when we need to compare two models, we prefer the model with larger R2 . T E(y) = 0 + 1 x + 2 x2 is a linear model. T If we build a model for estimation/prediction, multicollinearity is not a problem. F Suppose we have a qualitative variable with k levels(k 3), we can use k-1 dummy variables to code it, here let's denote these dummy variables as x1 , , xk1 , then for some data points, we may have x1 xk1 6= 0. F For interaction model E(y) = 0 + 1 x1 + 2 x2 + 3 x1 x2 , if the test on the interaction term is significant but the tests on the individual variables included in the interaction term are nonsignificant, we only need to keep the interaction term. F For the following two models, we can use t-test H0 : 4 = 0 vs. H1 : 4 6= 0 to compare them. sh is ar stu ed d vi y re aC s o ou urc rs e eH w er as o. co m F Model 1: E(y) = 0 + 1 x1 + 2 x2 Model 2: E(y) = 0 + 1 x1 + 2 x2 + 3 x1 x2 + 4 x22 T In multiple linear regression, multicollinearity may cause inflated s. Question 2. (15 points) Th A data on the monthly sales (y) and the advertising expense (x1 ) are collected for three different advertising media (newspaper, radio, and TV). The scatter plots for the three media are given below. expense https://www.coursehero.com/file/14028232/Coursehero-exam-3-solpdf/ expense (b). Scatter plot for TV monthly sales y3 (b). Scatter plot for radio monthly sales y2 monthly sales y1 (a). Scatter plot for newspaper expense Stat 3500 Section 9 Exam 3 Instructor: Cheng Dong 3/8 (a). Based on plot (a) (the first plot on the left) only, use y as response variable, x1 as the independent variable, write down an appropriate model. Explain why you choose this model. Answer: Curvature in the plot suggests need for quadratic model. E(y) = 0 + 1 x1 + 2 x21 . (b). Part (a) is for newspaper only. Now, we are interested in all the three media. Based on the three plots of the data, with response variable y, write down an appropriate model for this data, be sure to define the dummy variable(s) clearly and use TV as the baseline level. (Hint: The full model can be built on the simple model in part (a). ) sh is ar stu ed d vi y re aC s o ou urc rs e eH w er as o. co m Answer: 1 x2 = 0 if newspaper otherwise 1 x3 = 0 if radio otherwise E(y) = (0 + 1 x2 + 2 x3 ) + (3 + 4 x2 + 5 x3 )x1 + (6 + 7 x2 + 8 x3 )x21 = 0 + 1 x2 + 2 x3 + 3 x1 + 4 x1 x2 + 5 x1 x3 + 6 x21 + 7 x2 x21 + 8 x3 x21 (c). Based on your model in part (b), write down the separate lines (quadratic curves) for each of the three media. If you cannot build the model in part (b), you can use the following model instead: E(y) = 0 + 1 x2 + 2 x3 + 3 x1 + 4 x1 x2 + 5 x1 x3 (you need to define the dummy variable(s)) Answer: For newspaper, E(y) = (0 + 1 ) + (3 + 4 )x1 + (6 + 7 )x21 For radio, E(y) = (0 + 2 ) + (3 + 5 )x1 + (6 + 8 )x21 Th For TV, E(y) = 0 + 1 x1 + 6 x21 https://www.coursehero.com/file/14028232/Coursehero-exam-3-solpdf/ Stat 3500 Section 9 Exam 3 Question 3. Instructor: Cheng Dong 4/8 (55 points) A researcher hopes to study the relation of amount of body fat (y) to several possible predictor variables, based on a sample of 20 healthy females aged 20-35. The possible predictor variables are triceps skinfold thickness (x1 ), thigh circumference (x2 ), and midarm circumference (x3 ). The amount of body fat in the table below for each of the 20 person was obtained by a cumbersome and expensive procedure requiring the immersion of the person in water. It would therefore be helpful if a regression model with some or all of these predictor variables could provide estimates of the amount of body fat. x1 19.5 24.7 30.7 29.8 19.1 25.6 31.4 27.9 22.1 25.5 31.1 30.4 18.7 19.7 14.6 29.5 27.7 30.2 22.7 25.2 x2 x3 43.1 29.1 49.8 28.2 51.9 37 54.3 31.1 42.2 30.9 53.9 23.7 58.5 27.6 52.1 30.6 49.9 23.2 53.5 24.8 56.6 30 56.7 28.3 46.5 23 44.2 28.6 42.7 21.3 54.4 30.1 55.3 25.7 58.6 24.6 48.2 27.1 51 27.5 y 11.9 22.8 18.7 20.1 12.9 21.7 27.1 25.4 21.3 19.3 25.4 27.2 11.7 17.8 12.8 23.9 22.6 25.4 14.8 21.1 sh is ar stu ed d vi y re aC s o ou urc rs e eH w er as o. co m Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Th At first we only consider predictor variables x1 and x2 . An interaction model (Model 1) is used and the following output is obtained from Minitab. Model 1: R e g r e s s i o n A n a l y s i s : y v e r s u s x1 , x2 , x1x2 The r e g r e s s i o n e q u a t i o n i s y = 1 8 . 3 + 0 . 1 9 x1 + 0 . 6 4 1 x2 + 0 . 0 0 0 7 x1x2 Predictor Constant x1 x2 x1x2 Coef 18.31 0.188 0.6414 0.00070 SE Coef 35.82 1.415 0.7884 0.02843 T ? ? ? ? https://www.coursehero.com/file/14028232/Coursehero-exam-3-solpdf/ S = 2.62139 RSq = 77.8% P ? ? ? ? RSq ( a d j ) = 73.6% Stat 3500 Section 9 Exam 3 Instructor: Cheng Dong 5/8 Analysis of Variance Source Regression Residual Error Total DF 3 16 19 SS 385.44 109.95 495.39 MS 128.48 6.87 F 18.70 P 0.000 (a). (3 points) Write down Model 1 based on the given information and the Minitab output above. Answer: E(y) = 0 + 1 x1 + 2 x2 + 3 x1 x2 sh is ar stu ed d vi y re aC s o ou urc rs e eH w er as o. co m (b). (9 points) At significance level of 0.05, is there sufficient evidence to indicate that x1 depends on x2 ? (Clearly state all steps) Answer: n = 20, p = k + 1 = 4, so n p = 16 (1) H0 : 3 = 0 HA : 3 6= 0 (2) = 0.05 (3) t = 0.0007 0.02843 = 0.02462 (4) Reject H0 if |t| > t(0.025,16) = 2.12 (4) Conclusion: Accept H0 . So there's no sufficient evidence to conclude x1 and x2 interact. Then we consider models without interaction terms (Model 2 and Model 3 ). Minitab outputs are shown below: Model 2: Th R e g r e s s i o n A n a l y s i s : y v e r s u s x1 , x2 The r e g r e s s i o n e q u a t i o n i s y = 1 9 . 2 + 0 . 2 2 2 x1 + 0 . 6 5 9 x2 Predictor Constant x1 x2 S = 2.54317 Coef 19.174 0.2224 0.6594 SE Coef 8.361 0.3034 0.2912 RSq = ? T 2.29 0.73 2.26 P 0.035 0.474 0.037 RSq ( a d j ) = ? Analysis of Variance https://www.coursehero.com/file/14028232/Coursehero-exam-3-solpdf/ Stat 3500 Section 9 Exam 3 Source Regression Residual Error Total Instructor: Cheng Dong DF 2 17 19 SS 385.44 109.95 495.39 MS 192.72 6.47 F 29.80 6/8 P 0.000 Model 3: R e g r e s s i o n A n a l y s i s : y v e r s u s x1 , x2 , x3 The r e g r e s s i o n e q u a t i o n i s y = 117 + 4 . 3 3 x1 2 . 8 6 x2 2 . 1 9 x3 Coef 117.08 4.334 2.857 2.186 SE Coef 99.78 3.016 2.582 1.595 T 1.17 1.44 1.11 1.37 P 0.258 0.170 0.285 0.190 sh is ar stu ed d vi y re aC s o ou urc rs e eH w er as o. co m Predictor Constant x1 x2 x3 S = 2.47998 RSq = 80.1% RSq ( a d j ) = 76.4% Analysis of Variance Source Regression Residual Error Total DF 3 16 19 SS 396.98 98.40 495.39 MS 132.33 6.15 F 21.52 P 0.000 (c). (4 points) Based on the Minitab outputs, write down Model 2 and Model 3. Answer: Model 2: E(y) = 0 + 1 x1 + 2 x2 Model 3: E(y) = 0 + 1 x1 + 2 x2 + 3 x3 Th (d). (5 points) Calculate Ra2 of Model 2. Answer: n = 20, k h= 2, so i\u0010 p=k+ \u0011 1 = 3. n1 SSE 109.95 2 Ra = 1 n(k+1) SSyy = 1 201 = 1 1.1176 0.2219 = 0.752 = 75.2% 203 495.39 (e). (2 points) Are Model 2 and Model 3 nested? Explain. Answer: Yes. If 3 = 0, then Model 3 reduces to Model 2 (f). (10 points) Test the global utility of Model 2 and Model 3, are these two models useful? Answer: Model 2 (1) H0 : 1 = 2 = 0 HA : ALOI (2) = 0.05 (3) p-value= 0 https://www.coursehero.com/file/14028232/Coursehero-exam-3-solpdf/ (4) Conclusion: since 0 < 0.05, so reject H0 . Stat 3500 Section 9 Exam 3 Instructor: Cheng Dong 7/8 Model 3 (1) H0 : 1 = 2 = 3 = 0 HA : ALOI (2) = 0.05 (3) p-value= 0 (4) Conclusion: since 0 < 0.05, so reject H0 . Both models are useful. (g). (12 points) Compare Model 2 and Model 3 at the level of 0.05, which model is better? Explain.(Clearly state all steps) Answer: First, by question(e), both models are useful. sh is ar stu ed d vi y re aC s o ou urc rs e eH w er as o. co m (a) H0 : 3 = 0, vs. HA : 3 6= 0 (b) = 0.05 (c) k = 3, g = 2 F = (109.95 98.4)/(3 2) 11.55 (SSER SSEC )/(k g) = = = 1.878 SSEC /[n (k + 1)] 98.4/[20 (3 + 1)] 6.15 (d) The rejection region: {F > F(0.05,1,16) } = {F > 4.49} (e) Conclusion: Accept H0 . Model 2 is better, it's simpler and has the same prediction power as Model 3. (h). (5 points) Are Model 1 and Model 3 nested? Which of them is better? Explain. Answer: No. They are not nested because none of them contains all the terms of the other one. Model 3 has Ra2 = 76.4% while Model 1 has Ra2 = 73.6%. So Model 3 is better because it has a larger value of Ra2 . Th (i). (5 points) Ignore the results in previous questions. Use Model 2 to predict the amount of body fat of a female with triceps skinfold thickness 15, thigh circumference 38 and midarm circumference 22. Is this prediction meaningful? Answer: x1 = 15 and x2 = 38 and y = 19.2 + 0.222x1 + 0.659x2 . So y = 19.2 + 0.222 15 + 0.659 38 = 9.172. Since the range of x2 is [42.2, 58.6] and x2 of this subject is outside of the range, this prediction is an extrapolation, it's NOT meaningful. Question 4. (10 points) A researcher is interested in whether a new drug affects activity levels of lab animals. A total of 15 animals were randomly divided into three groups, one group received low dosage, another group received medium dosage, and the last group received high dosage. The following are the levels of activity after the treatments (y, the higher the more active). https://www.coursehero.com/file/14028232/Coursehero-exam-3-solpdf/ Stat 3500 Section 9 Exam 3 Instructor: Cheng Dong High 7 6 8 4 5 Mean 6 Medium 2 3 1 2 4 2.4 8/8 Low 0 2 0 3 1 1.2 (a). (3 points) Propose a model for estimating the activity levels of lab animals for different dosage. Be sure to define any indicator variables and use Low as the baseline level. Answer: 0 if High otherwise x2 = 1 0 if Medium otherwise sh is ar stu ed d vi y re aC s o ou urc rs e eH w er as o. co m x1 = 1 E(y) = 0 + 1 x1 + 2 x2 (b). (7 points) Calculate the least squares line by hand. Th Answer: 0 = Low = 1.2 1 = High Low = 6 1.2 = 4.8 2 = M edium Low = 2.4 1.2 = 1.2 y = 1.2 + 4.8x1 + 1.2x2 https://www.coursehero.com/file/14028232/Coursehero-exam-3-solpdf/ Powered by TCPDF (www.tcpdf.org)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
