A firm operates a large, direct-to-consumer sales force. The firm would like to build a system to monitor the progress of new agents. The goal is to identify “superstar agents” as rapidly as possible, offer them incentives, and keep them with the firm. A key task for agents is to open new accounts; an account is a new customer to the business. The response of interest is the profit to the firm (in dollars) of contracts sold by agents over their first year. These data summarize the early performance of 464 agents. Among the possible explanations of performance are the number of new accounts developed by the agent during the first 3 months of work and the commission earned on early sales activity. An analyst at the firm is using an equation of the form (with natural logs)
Log Profit = b0 + b1 Log Accounts
+ b2 Log Early Commission
For cases having value 0 for early commission, the analyst replaced zero with $1.
(a) The choice of the analyst to fill in the 0 values of early commission with 1 so as to be able to take the log is a common choice (you cannot take the log of 0). From the scatterplot of Log Profit on Log Early Commission, you can see the effect of what the analyst did. What is the impact of these filled-in values on the marginal association?
(b) Is there much collinearity between the explanatory variables? How does the presence of these filled-in values affect the collinearity?
(c) Using all of the cases, does collinearity exert a strong influence on the standard errors of the estimates in the analyst’s multiple regression?
(d) Because multiple regression estimates the partial effect of an explanatory variable rather than its marginal effect, we cannot judge the effect of outliers on the partial slope from their position in the scatterplot of y on x. We can, however, see their effect by constructing a plot that shows the partial slope. To do this, we have to remove the effect of one of the explanatory variables from the other variables. Here’s how to make a so-called partial regression leverage plot for these data. First, regress Log Profit on Log Accounts and save the residuals. Second, regress Log Commission on Log Accounts and save these residuals. These regressions remove the effects of the number of accounts opened from the other two variables. Now, make a scatterplot of the residuals from the regression of Log Profit on Log Accounts on the residuals from the regression of Log Commission on Log Accounts. Fit the simple regression for this scatterplot, and compare the slope in this ft to the partial slope for Log Commission in the multiple regression. Are they different?
(e) Do the filled-in cases remain leveraged in the partial regression leverage plot constructed in part (d)? What does this view of the data suggest would happen to the estimate for this partial slope if these cases were excluded?
(f) What do you think about filling in these cases with 1 so that we can take the log? Should something else be done with them?

  • CreatedJuly 14, 2015
  • Files Included
Post your question