A marine biologist was hired by the EPA to determine whether the hot-water runoff from a particular power plant located near a large gulf is having an adverse effect on the marine life in the area. The biologist’s goal is to acquire a prediction equation for the number of marine animals located at certain designated areas, or stations, in the gulf. On the basis of past experience, the EPA considered the following environmental factors as predictors for the number of animals at a particular station:
x1 = Temperature of water (TEMP)
x2 = Salinity of water (SAL)
x3 = Dissolved oxygen content of water (DO)
X4 = Turbidity index, a measure of the turbidity of the water (TI)
x5 = Depth of the water at the station (ST_DEPTH)
x6 = Total weight of sea grasses in sampled area(TGRSWT)
As a preliminary step in the construction of this model, the biologist used a stepwise regression procedure to identify the most important of these six variables. A total of 716 samples was taken at different stations in the gulf, producing the SPSS printout shown on page 687. (The response measured was y, the logarithm of the number of marine animals found in the sampled area.)
a. According to the printout, which of the independent variables should be used in the model?
b. Are we able to assume that the marine biologist has identified all the important independent variables for the prediction of y? Why?
c. Using the variables identified in part a, write the first-order model with interaction that may be used to predict y .
d. How would the marine biologist determine whether the model specified in part c is better than the first-order model?
e. Note the small value of R2. What action might the biologist take to improve the model?