# Question: A firm operates a large direct to consumer sales force The firm

A firm operates a large, direct-to-consumer sales force. The firm would like to build a system to monitor the progress of new agents. The goal is to identify â€śsuperstar agentsâ€ť as rapidly as possible, offer them incentives, and keep them with the firm. A key task for agents is to open new accounts; an account is a new customer to the business. The response of interest is the profit to the firm (in dollars) of contracts sold by agents over their first year. These data summarize the early performance of 464 agents. Among the possible explanations of performance are the number of new accounts developed by the agent during the first 3 months of work and the commission earned on early sales activity. An analyst at the firm is using an equation of the form (with natural logs)

Log Profit = b0 + b1 Log Accounts

+ b2 Log Early Commission

For cases having value 0 for early commission, the analyst replaced zero with $1.

(a) The choice of the analyst to fill in the 0 values of early commission with 1 so as to be able to take the log is a common choice (you cannot take the log of 0). From the scatterplot of Log Profit on Log Early Commission, you can see the effect of what the analyst did. What is the impact of these filled-in values on the marginal association?

(b) Is there much collinearity between the explanatory variables? How does the presence of these filled-in values affect the collinearity?

(c) Using all of the cases, does collinearity exert a strong influence on the standard errors of the estimates in the analystâ€™s multiple regression?

(d) Because multiple regression estimates the partial effect of an explanatory variable rather than its marginal effect, we cannot judge the effect of outliers on the partial slope from their position in the scatterplot of y on x. We can, however, see their effect by constructing a plot that shows the partial slope. To do this, we have to remove the effect of one of the explanatory variables from the other variables. Hereâ€™s how to make a so-called partial regression leverage plot for these data. First, regress Log Profit on Log Accounts and save the residuals. Second, regress Log Commission on Log Accounts and save these residuals. These regressions remove the effects of the number of accounts opened from the other two variables. Now, make a scatterplot of the residuals from the regression of Log Profit on Log Accounts on the residuals from the regression of Log Commission on Log Accounts. Fit the simple regression for this scatterplot, and compare the slope in this ft to the partial slope for Log Commission in the multiple regression. Are they different?

(e) Do the filled-in cases remain leveraged in the partial regression leverage plot constructed in part (d)? What does this view of the data suggest would happen to the estimate for this partial slope if these cases were excluded?

(f) What do you think about filling in these cases with 1 so that we can take the log? Should something else be done with them?

Log Profit = b0 + b1 Log Accounts

+ b2 Log Early Commission

For cases having value 0 for early commission, the analyst replaced zero with $1.

(a) The choice of the analyst to fill in the 0 values of early commission with 1 so as to be able to take the log is a common choice (you cannot take the log of 0). From the scatterplot of Log Profit on Log Early Commission, you can see the effect of what the analyst did. What is the impact of these filled-in values on the marginal association?

(b) Is there much collinearity between the explanatory variables? How does the presence of these filled-in values affect the collinearity?

(c) Using all of the cases, does collinearity exert a strong influence on the standard errors of the estimates in the analystâ€™s multiple regression?

(d) Because multiple regression estimates the partial effect of an explanatory variable rather than its marginal effect, we cannot judge the effect of outliers on the partial slope from their position in the scatterplot of y on x. We can, however, see their effect by constructing a plot that shows the partial slope. To do this, we have to remove the effect of one of the explanatory variables from the other variables. Hereâ€™s how to make a so-called partial regression leverage plot for these data. First, regress Log Profit on Log Accounts and save the residuals. Second, regress Log Commission on Log Accounts and save these residuals. These regressions remove the effects of the number of accounts opened from the other two variables. Now, make a scatterplot of the residuals from the regression of Log Profit on Log Accounts on the residuals from the regression of Log Commission on Log Accounts. Fit the simple regression for this scatterplot, and compare the slope in this ft to the partial slope for Log Commission in the multiple regression. Are they different?

(e) Do the filled-in cases remain leveraged in the partial regression leverage plot constructed in part (d)? What does this view of the data suggest would happen to the estimate for this partial slope if these cases were excluded?

(f) What do you think about filling in these cases with 1 so that we can take the log? Should something else be done with them?

**View Solution:**## Answer to relevant Questions

These data describe promotional spending by a pharmaceutical company for a cholesterol-lowering drug. The data cover 39 consecutive weeks and isolate the area around Boston. The variables in this collection are shares. ...1. Interactions introduce collinearity into a multiple regression and should be removed from the model if not statistically significant. 2. If neither the interaction nor the dummy variable is statistically significant in an ...A two-sample t-test has a lot in common with simple regression. This output summarizes the results of fitting a simple regression with only a dummy variable as the explanatory variable. The data are the same salary data used ...Download (introduced in Chapter 19) Before purchasing videoconferencing equipment, a company ran tests of its current internal computer network. The goal of the tests was to measure how rapidly data moved through the network ...The music on an Apple iPod can be stored digitally in several formats. A popular format for Apple is known as AIFF, short for Audio Interchange File Format. Another format is known as AAC, short for Advanced Audio Coding. ...Post your question