Question: Five classification models were built for predicting whether a neighborhood will soon see a large rise in home prices, based on public elementary school ratings

Five classification models were built for predicting whether a neighborhood will soon see a large rise in home prices, based on public elementary school ratings and other factors. The training data set was missing the school rating variable for every new school(3% o the data points).

Because ratings are unavailable for newly-opened schools, it is believed that locations that have recently experienced high population growth are more likely to have missing school rating data.

Model 1 used imputation, filling in the missing data with the average school rating from the rest of the data.

Model 2 used imputation, building a regression model to fill in the missing school rating data based on other variables

Model 3 used imputation, first building a classification model to estimate(based on other variables) whether a new school is likely to have been built as a result of recent population growth(or whether it has been built for another purpose, e.g to replace a very old school), and then using that classification to select one of two regression models to fill an estimate of the school rating; there are two different regression models(based on other variables), one for neighborhoods with new schools built due to population growth, and one for neighborhoods with new schools built due to population growth, and one for neighborhoods with new schools built for other reasons.

Model 4 used a binary variable to identify locations with missing information.

Model 5 used a categorical variable: first, a classification model was used to estimate whether a new school is likely to have been built as a result of recent population growth; and then each neighborhood was categorized as "data available", "missing, population growth", or "missing, other reason".

Question: If school ratings cannot be reasonably well-predicted from the other factors, and new schools built due to recent population growth cannot be well reasonably well-classified using the other factors, which model would you recommend?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!