In this problem, we consider splitting when building a regression tree in the CART algorithm. We...

Fantastic news! We've Found the answer you've been seeking!

Question:

In this problem, we consider splitting when building a regression tree in the CART algorithm. We assume that

e) (10 points) Consider a modification of the regression tree algorithm such that, in addition to considering

Transcribed Image Text:

In this problem, we consider splitting when building a regression tree in the CART algorithm. We assume that there is a feature vector X RP and dependent variable Ye R. We have collected a training dataset (x, y),..., (In, Yn), where x R and y; E R for all i = 1, ..., n. We also assume, for simplicity, that we are considering the initial split at the top (root node) of the tree. An arbitrary split simply divides the training dataset into a partition of size two. By appropriately reshuffling the data, we can represent this partition (again for simplicity) via two sub-datasets (x1, y),..., (TN, YN) and (TN+1, YN+1),..., (En, Yn) where N is the index of the last observation included in the first set. Assume throughout that our impurity function is the RSS error the standard choice for a regression tree. e) (10 points) Consider a modification of the regression tree algorithm such that, in addition to considering splits of the form described in the paragraph preceding part (d), we also consider splits of the form R(j,l,t) = {X : XjX < t} and R(j,l,t) = {X : XjX t} where j and e are the indices of two chosen features and t is a cutoff value for XjXe. Is it possible for these new splits to improve the regression tree? Explain. In this problem, we consider splitting when building a regression tree in the CART algorithm. We assume that there is a feature vector X RP and dependent variable Ye R. We have collected a training dataset (x, y),..., (In, Yn), where x R and y; E R for all i = 1, ..., n. We also assume, for simplicity, that we are considering the initial split at the top (root node) of the tree. An arbitrary split simply divides the training dataset into a partition of size two. By appropriately reshuffling the data, we can represent this partition (again for simplicity) via two sub-datasets (x1, y),..., (TN, YN) and (TN+1, YN+1),..., (En, Yn) where N is the index of the last observation included in the first set. Assume throughout that our impurity function is the RSS error the standard choice for a regression tree. e) (10 points) Consider a modification of the regression tree algorithm such that, in addition to considering splits of the form described in the paragraph preceding part (d), we also consider splits of the form R(j,l,t) = {X : XjX < t} and R(j,l,t) = {X : XjX t} where j and e are the indices of two chosen features and t is a cutoff value for XjXe. Is it possible for these new splits to improve the regression tree? Explain.

Related Book For answer-question

answer-question

Business Intelligence And Analytics Systems For Decision Support

Business Intelligence And Analytics Systems For Decision Support

ISBN: 9781292009209

10th Global Edition

Authors: Efraim Turban, Ramesh Sharda, Dursun Delen, Pearson Education Limited, Dennis G. Zill

See More Books

Posted Date: Oct 09, 2023 06:25 AM