Periodically, software engineers must provide estimates of their effort in developing new software. In the Journal of

Question:

Periodically, software engineers must provide estimates of their effort in developing new software. In the Journal of Empirical Software Engineering (Vol. 9, 2004), multiple regression was used to predict the accuracy of these effort estimates.

The dependent variable, defined as the relative error in estimating effort,

y = (Actual effort-Estimated effort)/(Actual effort)

Was determined for each in a sample of n = 49 software development tasks. Eight independent variables were evaluated as potential predictors of relative error using stepwise regression. Each of these was formulated as a dummy variable, as shown in the table.

Company role of estimator: x1 = 1 if developer, 0 if project leader

Task complexity: x2 = 1 if low, 0 if medium/high

Contract type: x3 = 1 if fixed price, 0 if hourly rate

Customer importance: x4 = 1 if high, 0 if low/medium

Customer priority: x5 = 1 if time of delivery, 0 if cost or quality

Level of knowledge: x6 = 1 if high, 0 if low/medium

Participation: x7 = 1 if estimator participates in work, 0 if not

Previous accuracy: x8 = 1 if more than 20% accurate, 0 if less than 20% accurate

a. In step 1 of the stepwise regression, how many different one-variable models are fitted to the data?

b. In step 1, the variable x1 is selected as the “best” one-variable predictor. How is this determined?

c. In step 2 of the stepwise regression, how many different two-variable models (where x 1 is one of the variables) are fitted to the data?

d. The only two variables selected for entry into the stepwise regression model were x1 and x8. The step wise regression yielded the following prediction equation:

y = .12 - .28x1 + .27x8

Give a practical interpretation of the b estimates multiplied by x1 and x8.

e. Why should a researcher be wary of using the model, part d, as the final model for predicting effort (y)?