Question: Problem 1: (30 points) Consider the algorithm for building a CART model in the case of regression. Following and ex- panding on the notation from


Problem 1: (30 points) Consider the algorithm for building a CART model in the case of regression. Following and ex- panding on the notation from class, suppose that our current tree, denoted by Told, has |Told| = M terminal nodes/buckets. For each bucket m = 1, . .., M, let: 1. Nm denote the number of observations in bucket m , 2. Qm(Told) denote the value of the impurity function at bucket m , and 3. Am denote the region in the feature space corresponding to bucket m . Also let / be the overall total number of observations. Recall that, in the case of regression we have that: Qm(Told) = where ym = Nm LineRm yi is the mean response in bucket m. Then the total impurity cost of the tree Told is defined as: M Cimp(Told) = > NmQm(Told) . m=1 Consider a potential split at the final bucket M (we're using M just for ease of notation), which results in a new tree Thew. This new tree has |Thew| = M + 1 terminal nodes/buckets, and for this new tree we let 1. Am denote the number of observations in bucket m , 2. Qm(Tnew) denote the value of the impurity function at bucket m , and 3. Am denote the region in the feature space corresponding to bucket m
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
