Consider again the problem of estimating the costs of drilling oil wells, which was originally discussed in

Question:

Consider again the problem of estimating the costs of drilling oil wells, which was originally discussed in Problem 12.2.4. The data set given in DS 13.2.3 contains the variables geology, downtime, and rig-index in addition to the variables depth and cost considered before. The variable geology is a score that measures the geological properties of the materials that have to be drilled through. Harder materials have larger scores, and so larger values of the geology variable indicate that harder materials have to be drilled through to complete the oil well. The variable downtime measures the number of hours that the drilling rig is idle due to factors such as inclement weather and interruptions for borehole and geological tests. The variable rig-index compares the daily rental costs of the drilling rig to the cost in 1980. Thus, an index of 1 implies that the rental costs are identical to those in 1980, and an index of 2 implies that the rental costs are twice what they were in 1980. (a) Fit the multiple linear regression model
y = β0 + β1x1 + β2x2 + β3x3 + β4x4
with the response variable y as the cost and with x1 as the depth, x2 as geology score, x3 as the downtime, and x4 as the rig-index, and make plots of cost against each of the four input variables.
(b) Explain why the variable geology should be removed from the model. Does this surprise you? What is the sample correlation coefficient between cost and geology? What is the sample correlation coefficient between depth and geology? Why do you think that geology is not needed in the model?
(c) Should any other variables be removed from the model? What is the final model that you would recommend for use? (This problem is continued in Problem 13.4.2.)
Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Question Posted: