# Question

The file P10_66.xlsx contains monthly cost accounting data on overhead costs, machine hours, and direct material costs. This problem will help you explore the meaning of R2 and the relationship between R2 and correlations.

a. Create a table of correlations between the individual variables.

b. If you ignore the two explanatory variables Machine Hours and Direct Material Cost and predict each Overhead Cost as the mean of Overhead Cost, then a typical “error” is Overhead Cost minus the mean of Overhead Cost. Find the sum of squared errors using this form of prediction, where the sum is over all observations.

c. Now run three regressions:

(1) Overhead Cost (OHCost) versus Machine Hours,

(2) OHCost versus Direct Material Cost, and

(3) OHCost versus both Machine Hours and Direct Material Cost.

For each, find the sum of squared residuals, and divide this by the sum of squared errors from part b. What is the relationship between this ratio and the associated R2 for that equation?

d. For the first two regressions in part c, what is the relationship between R2 and the corresponding correlation between the dependent and explanatory variable? For the third regression it turns out that the R2 can be expressed as a complicated function of all three correlations in part a. That is, the function involves not just the correlations between the dependent variable and each explanatory variable, but also the correlation between the explanatory variables. Note that this R2 is not just the sum of the R2 values from the first two regressions in part c. Why do you think this is true, intuitively? However, R2 for the multiple regressions is still the square of a correlation—namely, the correlation between the observed and predicted values of OHCost. Verify that this is the case for these data.

a. Create a table of correlations between the individual variables.

b. If you ignore the two explanatory variables Machine Hours and Direct Material Cost and predict each Overhead Cost as the mean of Overhead Cost, then a typical “error” is Overhead Cost minus the mean of Overhead Cost. Find the sum of squared errors using this form of prediction, where the sum is over all observations.

c. Now run three regressions:

(1) Overhead Cost (OHCost) versus Machine Hours,

(2) OHCost versus Direct Material Cost, and

(3) OHCost versus both Machine Hours and Direct Material Cost.

For each, find the sum of squared residuals, and divide this by the sum of squared errors from part b. What is the relationship between this ratio and the associated R2 for that equation?

d. For the first two regressions in part c, what is the relationship between R2 and the corresponding correlation between the dependent and explanatory variable? For the third regression it turns out that the R2 can be expressed as a complicated function of all three correlations in part a. That is, the function involves not just the correlations between the dependent variable and each explanatory variable, but also the correlation between the explanatory variables. Note that this R2 is not just the sum of the R2 values from the first two regressions in part c. Why do you think this is true, intuitively? However, R2 for the multiple regressions is still the square of a correlation—namely, the correlation between the observed and predicted values of OHCost. Verify that this is the case for these data.

## Answer to relevant Questions

The file P10_67.xlsx contains hypothetical starting salaries for MBA students directly after graduation. The file also lists their years of experience prior to the MBA program and their class rank in the MBA program (on a ...Sales of single-family houses have been brisk in Mid City this year. This has especially been true in older, more established neighborhoods, where housing is relatively inexpensive compared to the new homes being built in ...Based on the data in the file P02_23.xlsx from the U.S. Department of Agriculture, explore the relationship between the number of farms (X) and the average size of a farm (Y) in the United States. a. Use the given data to ...Using the data given in P10_10.xlsx, estimate a multiple regression equation to predict the price of houses in a given community. Employ all available explanatory variables. Is there evidence of multicollinearity in this ...Stock market analysts are continually looking for reliable predictors of stock prices. Consider the problem of modeling the price per share of electric utility stocks (Y). Two variables thought to influence such a stock ...Post your question

0