Question: Suppose that the sales manager of a large automotive parts distributor wants to estimate as early as April the total annual sales of a region.
Suppose that the sales manager of a large automotive parts distributor wants to estimate as early as April the total annual sales of a region. On the basis of regional sales, the total sales for the company can also be estimated. If, based on past experience, it is found that the April estimates of annual sales are reasonably accurate, then in future years the April forecast could be used to revise production schedules and maintain the correct inventory at the retail outlets. Several factors appear to be related to sales, including the number of retail outlets in the region stocking the company's parts, the number of automobiles in the region registered as of April 1, and the total personal income for the first quarter of the year. Five independent variables were finally selected as being the most important (according to the sales manager). Then the data were gathered for a recent year. The total annual sales for that year for each region were also recorded. Note in the following table that for region 1 there were 1,739 retail outlets stocking the company's automotive parts, there were 9,270,000 registered automobiles in the region as of April 1, and so on. The sales for that year were $37,702,000. Annual Number of Personal Average Sales Number of Automobiles Income Age of ($ Retail Registered ($ Automobiles Number of millions), Outlets, (millions), billions), (years), Supervisors, Y X1 X2 X3 X4 X5 37.702 1,739 9.27 85.4 3.5 9.0 24.196 1,221 5.86 60.7 5.0 5.0 32.055 1,846 8.81 68.1 4.4 7.0 3.611 120 3.81 20.2 4.0 5.0 17.625 1,096 10.31 33.8 3.5 7.0 45.919 2,290 11.62 95.1 4.1 13.0 29.600 1,687 8.96 69.3 4.1 15.0 8.114 241 6.28 16.3 5.9 11.0 20.116 649 7.77 34.9 5.5 16.0 12.994 1,427 10.92 15.1 4.1 10.0 (a ) Consider the following correlation matrix. Which single variable has the strongest correlation with the dependent variable? The correlations between the independent variables outlets and income and between cars and outlets are fairly strong. Could this be a problem? What is this condition called? (Round your answers to 3 decimal places.) Sales Outlets Cars Income Age Outlets 0.899 Cars 0.605 0.775 Income 0.964 0.825 0.409 0.32 Age 0.489 0.447 0.349 3 Bosses 0.286 0.183 0.395 0.155 0.291 The strongest relationship is between . A problem if both "cars" and "outlets" are part of the final solution. Also, outlets and income are strongly correlated. This is called . (b ) The following regression equation was obtained using the five independent variables. What percent of the variation is explained by the regression equation? (Round your answer to 4 decimal places.) The regression equation is sales = 19.7 0.00063 outlets + 1.74 cars + 0.410 income + 2.04 age 0.034 bosses Predictor Coef Constan 19.672 t outlets 0.000629 cars 1.7399 income 0.40994 age 2.0357 Bosses 0.0344 SOURCE Regression Error Total R2 SE Coef t-ratio 5.422 3.63 0.002638 0.5530 0.04385 0.8779 0.1880 0.24 3.15 9.35 2.32 0.18 Analysis of Variance DF SS 5 1593.81 4 9.08 9 1602.89 MS 318.76 2.27 0.9943 (c) Conduct a global test of hypothesis to determine whether any of the regression coefficients are not zero. Use the .05 significance level. (Round your answer to 2 decimal places.) Ho is . The computed value of F is . 2.00 (d ) Conduct a test of hypothesis on each of the independent variables. Would you consider eliminating "outlets" and "bosses"? Use the .05 significance level. (Negative amounts should be indicated by a minus sign. Round your answer to 3 decimal places.) "outlets" and "bosses". Critical values are and . (e) The regression has been rerun below with "outlets" and "bosses" eliminated. Compute the coefficient of determination. How much R2 has changed from the previous analysis? (Round your answer to 4 decimal places.) The regression equation is sales = 18.9 + 1.61 cars + 0.400 income + 1.96 age Predictor Coef Constan 18.924 t cars 1.6129 income 0.40031 age 1.9637 SE Coef t-ratio 3.636 5.20 0.1979 0.01569 0.5846 8.15 25.52 3.36 Analysis of Variance SOURCE DF SS MS Regressio 3 1593.66 531.22 n Error Total 6 9 9.23 1602.89 1.54 R2= There was little change in the coefficient 0.9942 of determination (f Following is a histogram of the residuals. Does the normality assumption ) appear reasonable? Histogram of residual N = 10 Midpoint Count 1.5 1* 1.0 1* 0.5 2** 0.0 2** 0.5 2** 1.0 1* 1.5 1* The normality assumption reasonable. (g Following is a plot of the fitted values of Y (i.e., ) and the residuals. Do ) you see any violations of the assumptions