A national real-estate developer builds luxury homes in three types of locations: urban cities (city), suburbs (suburb),

Question:

A national real-estate developer builds luxury homes in three types of locations: urban cities (“city”), suburbs (“suburb”), and rural locations that were previously farmlands (“rural”). The response variable in this analysis is the change in the selling price per square foot from the time the home is listed to the time at which the home sells,
Change = (final selling price - initial listing price) / number of square feet
The initial listing price is a fixed markup of construction and financing costs. These homes typically sell for about $150 to $200 per square foot. The response is negative if the price falls; positive values indicate an increase such as occurs when more than one buyer wants the home. Other variables that appear in the analysis include the following:
Square Feet Size of the property, in square feet
Bathrooms Number of bathrooms in the home
Distance Distance in miles from the nearest public school
The observations are 120 homes built by this developer that were sold during the last calendar year.
Motivation
(a) Explain how a regression model that estimates the change in the value of the home would be useful to the developer?
Method
(b) Consider a regression model of the form
Change = β0 + β1 1 > Square Feet + β2 Baths + β3 Distance + e
Why use the variable 1/Square Feet rather than Square Feet alone?
(c) The model described in part (b) does not distinguish one location from another. How can the regression model be modified to account for differences among the three types of locations?
(d) Why might interactions that account for differences in the three types of locations be useful in the model? Do you expect any of the possible interactions to be important?
Mechanics
(e) Use a scatterplot matrix to explore the data. List any important features in the data that are relevant for regression modeling. (Use color-coding if your software allows.)
(f) Fit an initial model specified as in part (b) with- out accounting for location. Then compare the residuals grouped by location using side-by-side boxplots. Does this comparison suggest that location matters? In what way?
(g) Extend the model fit in part (b) to account for differences in location. Be sure to follow the appropriate procedure for dealing with interactions and verify whether your model meets the conditions of the MRM.
Message
(h) Summarize the fit of your model for the developer, showing three equations for the three locations (with rounded estimates). Help the developer understand the importance of the terms in these equations.
(i) Point out any important limitations of your model. In particular, are there other variables that you would like to include in the model but that are not found in the data table?