Create a standard partition of the data with all the tracked variables and 50% of observations in
Question:
Create a standard partition of the data with all the tracked variables and 50% of observations in the training set, 30% in the validation set, and 20% in the test set. Fit a single Classification tree using Age, HomeOwner, Female, Married, HouseholdSize, Income, Education, and Church as input variables and Undecided as the output variable. In Step 2 of XLMiner’s Classification Tree procedure, be sure to Normalize Input Data and to set the Minimum # records in a terminal node to 100. Generate the Full tree and Best pruned tree.
a. From the CT_Output worksheet, what is the overall error rate of the full tree on the training set? Explain why this is not necessarily an indication that the full tree should be used to classify future observations and the role of the best-pruned tree.
b. Consider a 50-year-old man who attends church, has 15 years of education, owns a home, is married, lives in a household of four people, and has an annual income of $150,000. Using the CT_PruneTree worksheet, does the best-pruned tree classify this observation as undecided?
c. For the default cutoff value of 0.5, what are the overall error rate, Class 1 error rate, and Class 0 error rate of the best-pruned tree on the test set?
d. Examine the decile-wise lift chart for the best-pruned tree on the test set. What is the first decile lift? Interpret this value.