Question: OMGT 6 6 1 3 Management Science Exercise # 4 Advanced ML , Ensemble Modeling, and Text Analytics Download the data files for this assignment.

OMGT

6613

Management Science Exercise #

4

Advanced ML

,

Ensemble Modeling, and Text Analytics Download the data files for this assignment. The file contains several tabs with the data required for the assignment. All graphs presented should be properly labeled. Data should be reported at a reasonable level of precision. For each problem submit your R script, use a separate script for each problem with a title that clearly lists the problem number. Problem

1 (10

points

)

In this problem we will attempt to develop a predictive model to estimate home prices in a particular geographic market. Accurately estimating home values is a valuable modeling goal. While in the past home valuations generally required detailed inspections and expert knowledge, many websites now try and offer price estimates on demand for any home. Our data set includes home characteristics and prices for

2, 930

homes in the Ames, Iowa area. The data includes detailed location

(

latitude and longitude

),

as well as neighborhood, and data on a large number of characteristics of the home. Given the large number of potential variables and likely high degrees of correlation we must be careful not to overfit the model. To complete the analysis, do the following.

Load the data set ames from the modeldata package.

(

ames

< -

modeldata::ames

)

Use the dplyr function ntile to record the quartile of Sale Price.

Generate a scatter plot of the homes using latitude and longitude. Color the points based on sale price quartile. Select a color scale that makes the graphic useful.

Generate a histogram of sale prices and record it in the template.

Generate a second histogram of the log of prices and record it in the template.

Split the data into a training and testing set with an

80 % - 20 %

split, use a random seed of

123

and use the price quartile as a strata variable.

We will generate some forecasts using the log price as the response variable so create a copy of the training data that adds the log of price to the training data and remove the dollar denominated price.

(

Hint: make sure Price is not a predictive variable in the log price models and vice versa

)

We will evaluate four different model types, a regularized regression model

(

glmnet

),

and a random forest model

(

ranger

) .

We will run each model to predict price and log price separately for a total of

4

models. We will generate each model using the train function from the caret package. Use the default values for the training grid.

For each model, record the summary of the training process in the template. OMGT

6613 2

Now generate price predictions for the test data from each model making sure to convert back to standard dollars for the log models

(

Use the exp function to take the anti

-

log

.)

Calculate the performance metrics

(

RMSE

,

RSQ

,

and MAE

)

from each model and record those results in the template. Report the MAE in the template from each model.

For each model, generate a scatter plot that shows the actual price on the x

-

axis and the predicted price on the y

-

axis. Add a line showing where forecasts and actual are equal.

Comment on which model you prefer and why.

Comment on how useful the model would be for generating price estimates based on the data we have available.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

On March 1, 2019, Janet Dodge began Dodge Delivery Service, which provides delivery of bulk mailings to the post office, neighborhood delivery of weekly newspapers, data delivery to computer service...

OMGT 6 6 1 3 Management Science Exercise # 3 Prediction, Classification and Text Analytics Download the data files for this assignment. The file contains several tabs with the data required for the...

I need to see the SPSS output. You need to have all z-scores, all charts, all descriptives data from SPSS, everything you used to answer the questions. I am sending you what the previous tutor sent...

The resulting bar chart shows that when HMK is the AR Clerk and FKL is the Cash Receipts Clerk, CT is the GL Accounting Clerk for $226,851 of current AR balances. However, there are $25,352 of...

Please see attached file and answer (1-2 pages) the following questions: - What is the structure of this industry? - Who are the primary players? - What external risks does the industry face? Table...

Jones & Bartlett Learning, LLC. NOT FOR RESALE OR DISTRIBUTION CHAPTER Hot Spot Analysis 10 LEARNING OBJECTIVES C A R R Provide a working definition of a \"hot spot.\" , Be able to explain different...

1. Casel Ivana's Ice Cream just finished its first six months of manufacturing and selling ice cream. The company has two main product lines, Ice cream cups and ice cream bars, both of which are...

Old MathJax webview Students are expected to read and critically assess an article assigned by the instructor. Read, reread, and consider the ideas advanced by the author. Do you agree? Disagree? Are...

contributed articles DOI:10.1145/ 2602574 How to use, and influence, consumer social communications to improve business performance, reputation, and profit. BY WEIGUO FAN AND MICHAEL D. GORDON The...

Suppose that a random sample of n = 100 observations is taken from the normal distribution with unknown mean and known variance 1, and let denote the sample median. Determine (approximately) the...

For the data in Appendix D, form three groups. In Exercise 16.13. (These data are available at www.uvm.edu/~dhowell/fundamentals8/DataFiles/Add.dat.) Group 1 has ADDSC scores of 40 or below, Group 2...

#5 of 22 Dora Inc. reported the following on the company's statement of cash flows: Net cash flows from operating activities $380,000 Net cash flows used for investing activities (110,000) Net cash...

Submit Answer 9 1 Points In spite of the potential safety hazards some people would like to have an Internet connection in their car A preliminary survey of adult Americans has estimated this...