# Question

Refer to the scenario described in Problem 19 and the file HousingBubble.

a. For the following substeps, consider the Pre-Crisis worksheet data. Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Predict the sale price using a regression tree. Use Price as the output variable and all the other variables as input variables. In Step 2 of XLMiner's Regression Tree procedure, be sure to Normalize input data, to set the Minimum #records in a terminal node to 1, and to specify Using Best prune tree as the scoring option. In Step 3 of XLMiner's Regression Tree procedure, set the maximum number of levels to 7. Generate the Full tree and Best pruned tree. Check the box next to In worksheet in the Score new data area. In the Match variable in the new range dialog box, (1) specify the New DataToPredict worksheet in the Worksheet: field, (2) enter the cell range A1:P2001 in the Data range: field, and (3) click Match variable(s) with same name(s). When completing the procedure, this will result in a LR_NewScore1 worksheet that contains the predicted sales price for each home in NewDataToPredict.

i. In terms of number of decision nodes, compare the size of the full tree to the size of the best pruned tree.

ii. What is the root mean squared error (RMSE) of the best pruned tree on the validation data and on the test data? iii. What is the average error on the validation data and test data? What does this suggest? iv. By examining the best pruned tree, what are the critical variables in predicting the price of a home?

b. Repeat part a with the Post-Crisis worksheet data.

c. The RT_NewScore1 and RT_NewScore2 worksheets contain the sales price predictions for the 2000 homes in the NewDataToPredict using the precrisis and postcrisis data, respectively. For each of these 2000 homes, compare the two predictions by computing the percentage change in predicted price between the precrisis and postcrisis model. Let percentage change = (postcrisis predicted price 2 precrisis predicted price)yprecrisis predicted price. Summarize these percentage changes with a histogram.

What is the average percentage change in predicted price between the precrisis and postcrisis models?

a. For the following substeps, consider the Pre-Crisis worksheet data. Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Predict the sale price using a regression tree. Use Price as the output variable and all the other variables as input variables. In Step 2 of XLMiner's Regression Tree procedure, be sure to Normalize input data, to set the Minimum #records in a terminal node to 1, and to specify Using Best prune tree as the scoring option. In Step 3 of XLMiner's Regression Tree procedure, set the maximum number of levels to 7. Generate the Full tree and Best pruned tree. Check the box next to In worksheet in the Score new data area. In the Match variable in the new range dialog box, (1) specify the New DataToPredict worksheet in the Worksheet: field, (2) enter the cell range A1:P2001 in the Data range: field, and (3) click Match variable(s) with same name(s). When completing the procedure, this will result in a LR_NewScore1 worksheet that contains the predicted sales price for each home in NewDataToPredict.

i. In terms of number of decision nodes, compare the size of the full tree to the size of the best pruned tree.

ii. What is the root mean squared error (RMSE) of the best pruned tree on the validation data and on the test data? iii. What is the average error on the validation data and test data? What does this suggest? iv. By examining the best pruned tree, what are the critical variables in predicting the price of a home?

b. Repeat part a with the Post-Crisis worksheet data.

c. The RT_NewScore1 and RT_NewScore2 worksheets contain the sales price predictions for the 2000 homes in the NewDataToPredict using the precrisis and postcrisis data, respectively. For each of these 2000 homes, compare the two predictions by computing the percentage change in predicted price between the precrisis and postcrisis model. Let percentage change = (postcrisis predicted price 2 precrisis predicted price)yprecrisis predicted price. Summarize these percentage changes with a histogram.

What is the average percentage change in predicted price between the precrisis and postcrisis models?

## Answer to relevant Questions

Refer to the scenario described in Problem 19 and the file HousingBubble. a. Consider the Pre-Crisis worksheet data. Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Predict ...Consider the case where Apple computer tracks online transactions at its iStore and is interested in learning about the purchase patterns of its customers in order to provide recommendations as a customer browses its Web ...Richardson Ski Racing (RSR) sells equipment needed for downhill ski racing. One of RSR’s products is fencing used on downhill courses. The fence product comes in 150-foot rolls and sells for $215 per roll. However, RSR ...An auto dealership is advertising that a new car with a sticker price of $35,208 is on sale for $25,995 if payment is made in full, or it can be financed at 0 percent interest for 72 months with a monthly payment of $489. ...Refer to the transportation problem described in Problem 16. Use the procedure described in Section 8.7 to try to find an alternative optimal solution. In Problem 16Post your question

0