# Question: This data table contains accounting and financial data that describe

This data table contains accounting and financial data that describe 324 companies operating in the information sector in 2010. The largest of these provide telephone services. The variables include the expenses on research and development (R&D), total assets of the company, and the cost of goods sold. All columns are reported in millions of dollars, so 1,000 = $1 billion. Use the natural logs of all variables and fit the regression of Log R&D Expenses on Log Assets and Log Cost Goods Sold.

(a) Thinking marginally for a moment, would you expect to find a correlation between the log of the total assets and the log of the cost of goods sold?

(b) Does the correlation between the explanatory variables change if you work with the data on the original scale rather than on a log scale? In which case is the correlation between the ex-planatory variables larger?

(c) In which case does correlation provide a more useful summary of the association between the two explanatory variables?

(d) What is the impact of the collinearity on the standard errors in the multiple regression using the variables on a log scale?

(e) We can see the effects of collinearity by constructing a plot that shows the slope of the multiple regression. To do this, we have to remove the effect of one of the explanatory variables from the other variables. Here’s how to make a so-called partial regression leverage plot for these data. First, regress Log R&D Expenses on Log Cost Goods Sold and save the residuals. Second, regress Log Assets on Log Cost Goods Sold and save these residuals. Now, make a scatterplot of the residuals from the regression of Log R&D Expenses on Log Cost Goods Sold on the residuals from the regression of Log Assets on Log Cost Goods Sold. Fit the simple regression for this scatterplot, and compare the slope of this simple regression to the partial slope for Log Assets in the multiple regression. Are they different?

(f) Compare the scatterplot of Log R&D Expenses on Log Assets to the partial regression plot constructed in part (e). What has changed?

(a) Thinking marginally for a moment, would you expect to find a correlation between the log of the total assets and the log of the cost of goods sold?

(b) Does the correlation between the explanatory variables change if you work with the data on the original scale rather than on a log scale? In which case is the correlation between the ex-planatory variables larger?

(c) In which case does correlation provide a more useful summary of the association between the two explanatory variables?

(d) What is the impact of the collinearity on the standard errors in the multiple regression using the variables on a log scale?

(e) We can see the effects of collinearity by constructing a plot that shows the slope of the multiple regression. To do this, we have to remove the effect of one of the explanatory variables from the other variables. Here’s how to make a so-called partial regression leverage plot for these data. First, regress Log R&D Expenses on Log Cost Goods Sold and save the residuals. Second, regress Log Assets on Log Cost Goods Sold and save these residuals. Now, make a scatterplot of the residuals from the regression of Log R&D Expenses on Log Cost Goods Sold on the residuals from the regression of Log Assets on Log Cost Goods Sold. Fit the simple regression for this scatterplot, and compare the slope of this simple regression to the partial slope for Log Assets in the multiple regression. Are they different?

(f) Compare the scatterplot of Log R&D Expenses on Log Assets to the partial regression plot constructed in part (e). What has changed?

## Relevant Questions

These data include the engine size or displacement (in liters) and horse-power (HP) of 318 vehicles sold in the United States in 2011. Fit a multiple regression with the log10 of the combined mileage rating as the response ...Collinearity among the predictors is common in many applications, particularly those that track the growth of a new business over time. The problem is worse when the business has steadily grown or fallen. Because the growth ...Multiple regression requires an assumption that the combination of the two simple regressions does not require. What is it, and what condition of the multiple regression does it affect? The following output summarizes the fit of an analysis of covariance to the data in Exercise 36. The variable D denotes a dummy variable, with D = + for values colored green and 0 otherwise. (a) What is the interpretation of ...Wine These data give ratings and prices of 257 red and white wines that appeared in Wine Spectator in 2009. For this analysis, we are interested in how the rating given to a wine is associated with its price, and if this ...Post your question