Periodically, software engineers must provide estimates of their effort in developing new software. In the Journal of Empirical Software Engineering (Vol. 9, 2004), multiple regression was used to predict the accuracy of these effort estimates.
The dependent variable, defined as the relative error in estimating effort,
y = (Actual effort-Estimated effort)/(Actual effort)
Was determined for each in a sample of n = 49 software development tasks. Eight independent variables were evaluated as potential predictors of relative error using stepwise regression. Each of these was formulated as a dummy variable, as shown in the table.
Company role of estimator: x1 = 1 if developer, 0 if project leader
Task complexity: x2 = 1 if low, 0 if medium/high
Contract type: x3 = 1 if fixed price, 0 if hourly rate
Customer importance: x4 = 1 if high, 0 if low/medium
Customer priority: x5 = 1 if time of delivery, 0 if cost or quality
Level of knowledge: x6 = 1 if high, 0 if low/medium
Participation: x7 = 1 if estimator participates in work, 0 if not
Previous accuracy: x8 = 1 if more than 20% accurate, 0 if less than 20% accurate
a. In step 1 of the stepwise regression, how many different one-variable models are fitted to the data?
b. In step 1, the variable x1 is selected as the “best” one-variable predictor. How is this determined?
c. In step 2 of the stepwise regression, how many different two-variable models (where x 1 is one of the variables) are fitted to the data?
d. The only two variables selected for entry into the stepwise regression model were x1 and x8. The step wise regression yielded the following prediction equation:
y = .12 - .28x1 + .27x8
Give a practical interpretation of the b estimates multiplied by x1 and x8.
e. Why should a researcher be wary of using the model, part d, as the final model for predicting effort (y)?

  • CreatedMay 20, 2015
  • Files Included
Post your question