# Question: A marine biologist was hired by the EPA to determine

A marine biologist was hired by the EPA to determine whether the hot-water runoff from a particular power plant located near a large gulf is having an adverse effect on the marine life in the area. The biologist’s goal is to acquire a prediction equation for the number of marine animals located at certain designated areas, or stations, in the gulf. On the basis of past experience, the EPA considered the following environmental factors as predictors for the number of animals at a particular station:

x1 = Temperature of water (TEMP)

x2 = Salinity of water (SAL)

x3 = Dissolved oxygen content of water (DO)

X4 = Turbidity index, a measure of the turbidity of the water (TI)

x5 = Depth of the water at the station (ST_DEPTH)

x6 = Total weight of sea grasses in sampled area(TGRSWT)

As a preliminary step in the construction of this model, the biologist used a stepwise regression procedure to identify the most important of these six variables. A total of 716 samples was taken at different stations in the gulf, producing the SPSS printout shown on page 687. (The response measured was y, the logarithm of the number of marine animals found in the sampled area.)

a. According to the printout, which of the independent variables should be used in the model?

b. Are we able to assume that the marine biologist has identified all the important independent variables for the prediction of y? Why?

c. Using the variables identified in part a, write the first-order model with interaction that may be used to predict y .

d. How would the marine biologist determine whether the model specified in part c is better than the first-order model?

e. Note the small value of R2. What action might the biologist take to improve the model?

x1 = Temperature of water (TEMP)

x2 = Salinity of water (SAL)

x3 = Dissolved oxygen content of water (DO)

X4 = Turbidity index, a measure of the turbidity of the water (TI)

x5 = Depth of the water at the station (ST_DEPTH)

x6 = Total weight of sea grasses in sampled area(TGRSWT)

As a preliminary step in the construction of this model, the biologist used a stepwise regression procedure to identify the most important of these six variables. A total of 716 samples was taken at different stations in the gulf, producing the SPSS printout shown on page 687. (The response measured was y, the logarithm of the number of marine animals found in the sampled area.)

a. According to the printout, which of the independent variables should be used in the model?

b. Are we able to assume that the marine biologist has identified all the important independent variables for the prediction of y? Why?

c. Using the variables identified in part a, write the first-order model with interaction that may be used to predict y .

d. How would the marine biologist determine whether the model specified in part c is better than the first-order model?

e. Note the small value of R2. What action might the biologist take to improve the model?

## Answer to relevant Questions

Define a regression residual. Refer to Exercise. Two MINITAB residual plots for the simple linear regression model are shown below. a. Which graph should be used to check for normal errors? Does the assumption of normality appear to be satisfied? b. ...Suppose you fit the regression model E(y) = β0 + β1x1 + β2x2 + β3x22 + β4x1x2 + β5 x1 x22 to n = 35 data points and wish to test the null hypothesis H0: β4 = β5 = 0. a. State the alternative hypothesis. b. ...Excessive exposure to solar radiation is known to increase the risk of developing skin cancer, yet many people do not practice “sun safety.” A group of University of Arizona researchers examined the feasibility of ...Refer to the Museum Management and Curatorship (June 2010) worldwide survey of 30 leading museums of contemporary art, Exercise. Recall that each museum manager was asked to provide the performance measure used most often ...Post your question