Refer to the scenario described in Problem 16 and the file CreditScore. Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Predict the individuals' credit scores using a regression tree. Use CreditScore as the output variable and all the other variables as input variables. In Step 2 of XLMiner's Regression Tree procedure, be sure to Normalize input data, and specify Using Best prune tree as the scoring option.
In Step 3 of XLMiner's Regression Tree procedure, set the maximum number of levels to 7.
Generate the Full tree, Best pruned tree, and Minimum error tree. Generate Detailed Scoring for all three sets of data.
a. What is the root mean squared error (RMSE) of the best pruned tree on the validation data and on the test data? Explain the difference and whether the magnitude of the difference is of concern in this case.
b. Interpret the set of rules implied by the best pruned tree and how these rules are used to predict an individual's credit score.
c. Examine the best pruned tree in RT_PruneTree1 as well as the predicted credit scores and residuals for the test data in RT_TestScore1. Identify the weakness of this regression tree model. Explain what may be contributing to inaccurate predictions, and discuss possible ways to improve the regression tree approach.
d. Repeat the construction of a regression tree following the previous instructions, but in Step 2 of XLMiner's Regression Tree procedure, set the Minimum #records in a terminal node to 1. How does the RMSE of the best pruned tree on the test data compare to the analogous measure from part a? In terms of number of decision nodes, how does the size of the best pruned tree compare to the size of the best pruned tree from part a?