Question: OMGT 6 6 1 3 Management Science Exercise # 4 Advanced ML , Ensemble Modeling, and Text Analytics Download the data files for this assignment.
OMGT Management Science Exercise # Advanced ML Ensemble Modeling, and Text Analytics Download the data files for this assignment. The file contains several tabs with the data required for the assignment. All graphs presented should be properly labeled. Data should be reported at a reasonable level of precision. For each problem submit your R script, use a separate script for each problem with a title that clearly lists the problem number. Problem points In this problem we will attempt to develop a predictive model to estimate home prices in a particular geographic market. Accurately estimating home values is a valuable modeling goal. While in the past home valuations generally required detailed inspections and expert knowledge, many websites now try and offer price estimates on demand for any home. Our data set includes home characteristics and prices for homes in the Ames, Iowa area. The data includes detailed location latitude and longitude as well as neighborhood, and data on a large number of characteristics of the home. Given the large number of potential variables and likely high degrees of correlation we must be careful not to overfit the model. To complete the analysis, do the following. Load the data set ames from the modeldata package. ames modeldata::ames Use the dplyr function ntile to record the quartile of Sale Price. Generate a scatter plot of the homes using latitude and longitude. Color the points based on sale price quartile. Select a color scale that makes the graphic useful. Generate a histogram of sale prices and record it in the template. Generate a second histogram of the log of prices and record it in the template. Split the data into a training and testing set with an split, use a random seed of and use the price quartile as a strata variable. We will generate some forecasts using the log price as the response variable so create a copy of the training data that adds the log of price to the training data and remove the dollar denominated price. Hint: make sure Price is not a predictive variable in the log price models and vice versa We will evaluate four different model types, a regularized regression model glmnet and a random forest model ranger We will run each model to predict price and log price separately for a total of models. We will generate each model using the train function from the caret package. Use the default values for the training grid. For each model, record the summary of the training process in the template. OMGT Now generate price predictions for the test data from each model making sure to convert back to standard dollars for the log models Use the exp function to take the antilog Calculate the performance metrics RMSE RSQ and MAE from each model and record those results in the template. Report the MAE in the template from each model. For each model, generate a scatter plot that shows the actual price on the xaxis and the predicted price on the yaxis. Add a line showing where forecasts and actual are equal. Comment on which model you prefer and why. Comment on how useful the model would be for generating price estimates based on the data we have available.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
