Question: CHAPTER 13a EXERCISES Consider the SAT data we have seen in class and that you worked with (or will work with) in Homework 7. Below
CHAPTER 13a EXERCISES Consider the SAT data we have seen in class and that you worked with (or will work with) in Homework 7. Below is an excerpt of the entire dataset, which is uploaded in Course Content (without the Verbal and Math variables): PPS T/S-R Avg.$ % Takers Verbal Math Total Alabama 4.405 17.2 31.144 8 491 538 1029 Alaska 8.963 17.6 47.951 47 445 489 934 Arizona 4.778 19.3 32.175 27 448 496 944 Arkansas 4.459 17.1 28.934 6 482 523 1005 * * * Wisconsin 6.93 15.9 37.746 9 501 572 1073 Wyoming 6.16 14.9 31.285 10 476 525 1001 We are interested in constructing a multiple regression model to predict Avg. Total SAT Score (ATS). 1. Consider the correlation matrix below showing the correlations between the potential predictors. Correlation: PPS, T/S Ratio, Avg. Salary, %Takers T/S Ratio Avg. Salary %Takers PPS -0.371 0.870 0.593 T/S Ratio Avg. Salary -0.001 -0.213 0.617 Cell Contents: Pearson correlation Based on this correlation matrix, what criticism can be made about the following multiple regression model? 1 Regression Analysis: Avg. Tot. Score versus Avg. Salary, %Takers, PPS Model Summary S 32.7980 R-sq 81.96% R-sq(adj) 80.78% R-sq(pred) 78.76% Coefficients Term Constant Avg. Salary %Takers PPS 2. Coef 998.0 -0.31 -2.840 13.33 SE Coef 31.5 1.65 0.225 7.04 T-Value 31.69 -0.19 -12.64 1.89 P-Value 0.000 0.853 0.000 0.065 VIF 4.39 1.65 4.20 Write the regression equation for the model in Question 1. 3. Below is the correlation for ATS and Avg. Salary. Correlation: Avg. Tot. Score, Avg. Salary Pearson correlation of Avg. Tot. Score and Avg. Salary = -0.440 TRUE or FALSE: When used as a predictor in a multiple regression model, the slope of Avg. Salary will always be negative. 4. Consider the model output below. Which model do we consider better, the model from Question 1 or this model? Why? Regression Analysis: Avg. Tot. Score versus PPS, %Takers Model Summary S 32.4595 R-sq 81.95% R-sq(adj) 81.18% R-sq(pred) 79.59% Coefficients Term Constant PPS %Takers Coef 993.8 12.29 -2.851 SE Coef 21.8 4.22 0.215 T-Value 45.52 2.91 -13.25 P-Value 0.000 0.006 0.000 2 VIF 1.54 1.54 5. Which of the following gives a correct interpretation of the slope of PPS in the model from Question 4? a. If PPS is increased by $1000, ATS will increase by 12.29 points. b. If PPS is increased by $1000, ATS is expected to increase by 12.29 points. c. Controlling for %Takers, if PPS is increased by $1000, ATS is expected to increase by 12.29 points. d. Controlling for %Takers, if PPS is increased by $1000, ATS will increase by 12.29 points. Consider the model output below. Use it to answer Questions 6 - 8. Regression Analysis: Avg. Tot. Score versus PPS, %Takers, T/S Ratio Model Summary S 32.5133 R-sq R-sq(adj) ______ _______ R-sq(pred) 78.63% Coefficients Term Constant PPS %Takers T/S Ratio 6. Coef 1035.5 11.01 -2.849 -2.03 SE Coef 50.3 4.45 0.215 2.21 T-Value 20.58 2.47 -13.22 -0.92 P-Value 0.000 0.017 0.000 0.363 VIF 1.71 1.54 1.16 TRUE or FALSE: This model must have a lower Multiple R-squared than the model from Question 4. 7. TRUE or FALSE: This model must have a higher Adjusted R-squared than the model from Question 4. 8. Suppose that for this model the variance of the residuals is 992.39. We also calculate that the variance of ATS is 5598.1. What is the Multiple R-squared of this model? a. 17.73% b. 21.55% c. 78.45% d. 82.27% Consider the model output below. Use it to answer Questions 9 - 11. 3 Regression Analysis: Avg. Tot. Score versus %Takers, T/S Ratio Model Summary S 34.2385 R-sq 79.91% R-sq(adj) 79.06% R-sq(pred) 76.83% Coefficients Term Constant %Takers T/S Ratio 9. a. b. c. d. e. Coef 1118.5 -2.547 -3.73 SE Coef 39.5 0.187 2.21 T-Value 28.34 -13.62 -1.69 P-Value 0.000 0.000 0.098 VIF 1.05 1.05 What are the degrees of freedom associated with the t tests of the slopes? 46 47 48 49 50 10. Under what conditions are the computed p-values for these tests valid? 11. Assuming those conditions are satisfied, which of the following gives a correct interpretation of the p-value of T/S Ratio? a. At the 10% significance level, T/S Ratio is a significant linear predictor. b. At the 10% significance level, T/S is a significant linear predictor in combination with %Takers. c. At the 5% significance level, T/S Ratio is not a significant linear predictor in combination with %Takers. d. Both b and c. Consider the model output below. Use it to answer Questions 12 - 14. Regression Analysis: Avg. Tot. Score versus Avg. Salary, %Takers Model Summary 4 S 33.6877 R-sq 80.56% R-sq(adj) 79.73% R-sq(pred) 78.08% Coefficients Term Constant Avg. Salary %Takers Coef 987.9 2.18 -2.779 SE Coef 31.9 1.03 0.228 T-Value 30.99 2.12 -12.16 P-Value 0.000 0.039 0.000 VIF 1.61 1.61 Prediction for Avg. Tot. Score Variable Avg. Salary %Takers Fit 936.181 Setting 40 50 SE Fit 6.34430 95% CI (923.418, 948.945) 95% PI (867.219, 1005.14) 12. Under what conditions are the confidence and prediction intervals computed above valid? 13. Assuming that these conditions are satisfied, which of the following statements is accurate? a. We are 95% confident that a state with an average teacher salary of $40,000 and 50% of eligible students taking the SAT will have an average total SAT score between 923.418 and 948.945. b. We are 95% confident that a state with an average teacher salary of $40,000 and 50% of eligible students taking the SAT will have an average total SAT score between 867.219 and 1005.14. c. Neither of the above. 14. What is the expected average total SAT score for a state with average teacher salary of $40,000 and 60% of eligible students taking the SAT? 15. Consider the model output below for a multiple regression to predict overall cruise ship rating from the listed predictors. Use it to answer Questions 15 & 16. 5 Regression Analysis: Overall versus Itineraries/Schedule, Shore Excursions, Food/Dining Model Summary S 1.38775 R-sq 74.98% R-sq(adj) 70.29% R-sq(pred) 58.09% Coefficients Term Constant Itineraries/Schedule Shore Excursions Food/Dining Coef 35.6 0.110 0.2445 0.2474 SE Coef 13.2 0.130 0.0434 0.0621 T-Value 2.69 0.85 ____ 3.98 P-Value 0.016 0.407 _____ 0.000 VIF 1.05 1.07 1.01 What are the missing T-Value and P-Value for Shore Excursions? 16. Which of the following statements below are true? a. If Itineraries/Schedule is removed from the model, the multiple R-squared will increase. b. If Itineraries/Schedule is removed from the model, the adjusted R-squared will decrease. c. If Itineraries/Schedule is removed from the model, the multiple R-squared will decrease. d. Cannot be determined based on the given information. 6 CHAPTER 12a: SIMPLE LINEAR REGRESSION Pictured above is a scatterplot of per pupil spending (PPS) vs. average total SAT score (AVGTOT) taken from recent SAT data for the 50 US states. Along with it, we have the following basic statistics: PPS: Min. 1st Qu. 3.656 Median 4.882 5.768 Min. 1st Qu. Median Mean 3rd Qu. 5.905 Max. 6.434 9.774 Mean 3rd Qu. Max. Std.Dev. 1.363 AVGTOT: 844.0 897.2 945.5 Pearson's Correlation: 965.9 1032.0 -0.381 Use this information to answer questions 1 - 4. 1 1107.0 Std.Dev. 74.82 1. The least squares regression line for this data is: A. avgtot = (-20.89 x pps) + 1089.3 B. avgtot = (-0.007 x pps) + 966 C. avgtot = (-7.95 x pps) + 1012.9 D. avgtot = (7.95 x pps) + 919 2. If we decide to standardize the average total SAT scores, what will be the correlation between Per Pupil Spending and the standardized SAT scores? 3. The percentage of variation in avg. total SAT score explained by its linear relationship with per pupil spending is: A. 38.1 B. 0.381 C. 0.145 D. 14.5 E. 61.9 4. The state with the lowest pps is Utah. In this model, its avg. total SAT score has a residual of approximately: A. -200 B. -10 C. 0 D. 60 E. 300 Based on a series of repeated test drives of a new Equus at constant speeds between 30 and 80 miles per hour, the equation of a simple linear regression line modeling miles per gallon (mpg) vs. speed driven (mph) is given by: mpg = 32 - (0.12 x mph). Use this information to answer questions 5 - 8. 5. An Equus driven at a constant speed of 70 mph will: A. get 1.2 mpg more than one driven at a constant speed of 80 mph. B. get 1.2 mpg less than one driven at a constant speed of 80 mph. C. be expected to get 1.2 mpg more than one driven at a constant speed of 80 mph. D. be expected to get 1.2 mpg less than one driven at a constant speed of 80 mph. E. be expected to get 8.4 mpg more than one driven at a constant speed of 80 mph. 2 6. The correlation between the mpg and mph variables for this experiment: A. must be positive. B. must be negative. C. must be greater than 1. D. must be less than -1. 7. We should be cautious of using this model to predict the miles per gallon when driving this car at a constant speed of 120 mph because: A. 120 mph is far above the average speed at which the car was tested. B. the car was never driven at 120 mph during the test. C. 120 mph is well outside the range of speeds at which the car was tested. D.. cars such as the Equus are not designed to be driven at 120 mph. 8. Explain this statement: In this model, the intercept of the regression equation has no meaning as a prediction. 9. Morningstar analyzes balance sheets of publicly traded companies and, using a proprietary formula, computes a \"Fair Value\" for the stock price of the company. For a number of randomly selected companies, a researcher records the Morningstar Fair Value and the latest closing price of the stock. Based on this data, she builds a regression model as a first step to determining if the Fair Value is a good predictor of the stock price. She then uses the model to make stock price predictions. The results are shown below. Use them to answer the questions that follow. Descriptive Statistics: Fair Value ($), Share Price ($) Variable Fair Value ($) Share Price ($) Total Count 28 28 Mean 54.46 46.64 StDev 23.62 24.54 Minimum 15.00 11.02 Maximum 98.00 103.05 Model Summary S 12.0064 R-sq 76.94% Coefficients Term Constant Fair Value ($) R-sq(adj) 76.06% Coef -2.99 0.9113 R-sq(pred) 73.25% SE Coef 5.79 0.0978 T-Value -0.52 9.31 3 P-Value 0.610 0.000 VIF 1.00 Regression Equation ___________________________________________________________ Fits and Diagnostics for All Observations Share Price ($) 98.63 11.02 61.39 41.56 41.26 40.37 29.44 69.76 44.29 27.71 88.63 36.36 39.00 46.30 50.39 37.02 66.04 66.70 103.05 18.33 34.18 24.18 33.10 13.02 39.35 84.20 33.17 27.60 Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 R Fit 69.91 12.50 72.65 28.91 60.80 58.98 45.31 65.36 49.87 44.40 76.29 35.29 31.64 40.75 48.96 27.08 72.65 69.91 86.32 17.97 32.55 23.44 19.79 10.68 40.75 76.29 51.69 35.29 Resid 28.72 -1.48 -11.26 12.65 -19.54 -18.61 -15.87 4.40 -5.58 -16.69 12.34 1.07 7.36 5.55 1.43 9.94 -6.61 -3.21 16.73 0.36 1.63 0.74 13.31 2.34 -1.40 7.91 -18.52 -7.69 Std Resid 2.49 -0.13 -0.98 1.09 -1.67 -1.59 -1.35 0.38 -0.47 -1.42 1.09 0.09 0.63 0.47 0.12 0.86 -0.58 -0.28 1.52 0.03 0.14 0.06 1.16 0.21 -0.12 0.70 -1.57 -0.66 Large residual 4 R Prediction for Share Price ($) Regression Equation ____________________________________________ Variable Fair Value ($) Fit 24.3509 SE Fit 3.29801 Setting 30 Variable Fair Value ($) 95% ___ (17.5717, 31.1300) Fit 79.0275 95% ___ (-1.24285, 49.9446) SE Fit 4.15151 Setting 90 95% ___ (70.4940, 87.5611) 95% ___ (52.9143, 105.141) a. What is the correlation between Fair Value and Share Price? b. What percentage of variance in the observed Share Price data is explained by the linear relationship with Fair Value? c. If we decide to use Share Price to predict Morningstar Fair Value, what percentage of variance in Fair Value would be explained by the linear relationship with Share Price? d. In the Coefficients table, Fair Value has a p-value of 0.000. This p-value is in connection with what type of hypothesis test? How many degrees of freedom does this hypothesis test have? Interpret the meaning of this p-value. Compute a 95% confidence interval for the \"population slope\" relating Share Price to Fair Value. Under what conditions are the p-value and the confidence interval valid? e. What is the regression equation for this model? f. What expected stock price does this model give for a stock with a Fair Value of 60? g. In the Normal Linear Model of this data, what is the estimate of the standard deviation of the error component? h. Correctly label and interpret the 95% confidence intervals and 95% prediction intervals that are computed for this model by Minitab. Under what conditions are your 5 interpretations of these intervals valid? How might these intervals be used in stock investing? i. Consider more closely the prediction intervals. What do they suggest about the ability of this model to predict individual stock prices based on Morningstar Fair Value? As part of the Minitab analysis of this regression model, an Anderson-Darling test of the residuals was done. It produced the graph below: Probability Plot of RESI1 Normal 99 Mean 4.060244E-15 StDev 11.78 N 28 AD 0.291 P-Value 0.585 95 90 Percent 80 70 60 50 40 30 20 10 5 1 -30 -20 -10 0 10 20 RESI1 j. What does this graph tell us about this regression model? The researcher also had Minitab produce the following graph: 6 30 Versus Fits (response is Share Price ($)) 30 Residual 20 10 0 -10 -20 0 10 20 30 40 50 60 Fitted Value k. What does this graph tell us about this regression model? 7 70 80 90