In this problem, we consider splitting when building a regression tree in the CART algorithm. We...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
In this problem, we consider splitting when building a regression tree in the CART algorithm. We assume that there is a feature vector X RP and dependent variable Y R. We have collected a training dataset (x, y),..., (In, Yn), where x R and y; E R for all i = 1,..., n. We also assume, for simplicity, that we are considering the initial split at the top (root node) of the tree. An arbitrary split simply divides the training dataset into a partition of size two. By appropriately reshuffling the data, we can represent this partition (again for simplicity) via two sub-datasets (x1, y),..., (TN, YN) and (TN+1, YN+1),..., (En, Yn) where N is the index of the last observation included in the first set. Assume throughout that our impurity function is the RSS error the standard choice for a regression tree. As mentioned, the above reasoning applies to an arbitrary split of the training dataset into a parti- tion of size 2. The CART algorithm only considers splits of a particular type - those corresponding to two regions R(j, s) = {X : Xj < s} and R(j, s) = {X : X; s} where j is the index of a chosen feature and s is a cutoff value. d) (10 points) Consider a modification of the regression tree algorithm such that, in addition to considering splits of the form described in the preceding paragraph, we also consider splits of the form R(j, t) = {X : eXi < t} and R(j,t) = {X : eX; t} where j is the index of a chosen feature and t is a cutoff value for the exponential function ei. Is it possible for these new splits to improve the regression tree? Explain. In this problem, we consider splitting when building a regression tree in the CART algorithm. We assume that there is a feature vector X RP and dependent variable Y R. We have collected a training dataset (x, y),..., (In, Yn), where x R and y; E R for all i = 1,..., n. We also assume, for simplicity, that we are considering the initial split at the top (root node) of the tree. An arbitrary split simply divides the training dataset into a partition of size two. By appropriately reshuffling the data, we can represent this partition (again for simplicity) via two sub-datasets (x1, y),..., (TN, YN) and (TN+1, YN+1),..., (En, Yn) where N is the index of the last observation included in the first set. Assume throughout that our impurity function is the RSS error the standard choice for a regression tree. As mentioned, the above reasoning applies to an arbitrary split of the training dataset into a parti- tion of size 2. The CART algorithm only considers splits of a particular type - those corresponding to two regions R(j, s) = {X : Xj < s} and R(j, s) = {X : X; s} where j is the index of a chosen feature and s is a cutoff value. d) (10 points) Consider a modification of the regression tree algorithm such that, in addition to considering splits of the form described in the preceding paragraph, we also consider splits of the form R(j, t) = {X : eXi < t} and R(j,t) = {X : eX; t} where j is the index of a chosen feature and t is a cutoff value for the exponential function ei. Is it possible for these new splits to improve the regression tree? Explain.
Expert Answer:
Answer rating: 100% (QA)
To build a regression tree using the CART Classification and Regression Trees algorithm we aim to mi... View the full answer
Related Book For
Understandable Statistics Concepts And Methods
ISBN: 9781337119917
12th Edition
Authors: Charles Henry Brase, Corrinne Pellillo Brase
Posted Date:
Students also viewed these programming questions
-
Part of the auditor's unmodified (also known as clean) opinion states that the "financial statements present fairly the financial position, results of operations, and cash flows of the company"....
-
3. Please explain in detail each step and coding characters of below Python code. (10 points) #Declare variables to store the budget amount, # amount spent, difference, and total. budget = 0.0...
-
Calculate the internal rate of return on this investment An investment project has the following cash flows: _________________________________________________________________________ year 0...
-
Medallion and RIEF (Renaissance Institutional Equity Fund) are both managed by Renaissance Technologies. How do they differ in terms of asset classes, dollar capacity, average holding period of each...
-
Air in a tank at 120 kPa and 300 K exhausts to the atmosphere through a 5-cm2-throat converging nozzle at a rate of 0.12 kg/s, what is the atmospheric pressure What is the maximum mass flow possible...
-
Prove that if two rows (columns) of the n ( n matrix A are proportional, then det(A) = 0.
-
Consider the effect on profitability of introducing a heat-mass exchanger into the ammonia synthesis loop operating at \(90 \mathrm{bar}\). What is the optimal operating temperature of the flash...
-
PAWV Power and Light has contracted with a waste disposal firm to have nuclear waste from its nuclear power plants in Pennsylvania disposed of at a government-operated nuclear waste disposal site in...
-
What is the present value of a perpetual stream of cash flows that pays $6,500 at the end of year one and the annual cash flows grow at a rate of 4% per year indefinitely, if the appropriate discount...
-
Shauna Coleman is single. She is employed as an architectural designer for Streamline Design (SD). Shauna wanted to determine her taxable income for this year. She correctly calculated her AGI....
-
Financial accounting focuses primarily on reporting: A. to parties outside of an organization. B. to parties within an organization. C. to an organization's board of directors. D. to financial...
-
On January 1, Year 1, Canglon, Inc., issues 10%, 5-year bonds with a face value of $150,000 when the effective rate is 12%. Interest is to be paid semiannually on June 30 and Decer Assume Canglon...
-
Assign the configuration (R or S) to each stereocentre indicated by an arrow in structure 24 (taxol) shown below. Clearly show how you have worked out your answer. (5 marks for the assignment, 5...
-
Winston Gladney invests a lump sum of $ 6 7 , 8 7 5 today ( February 1 , 2 0 2 1 ) . How much can Winston withdraw each year for four years beginning February 1 , 2 0 2 7 if he can earn 5 % until...
-
Blake and Matthew are partners who agree that Blake will receive a $109,100 salary allowance and that any remaining income or loss will be shared equally. If Matthew's capital account is credited for...
-
21. Protectionism is an economic policy of restraining trade between countries through methods such as tariffs on imported goods, restrictive quotas, and a variety of other government regulations. If...
-
The Fourier cosine series of a function is given by: f(x) = f,cos nx n=0 For f(x) = cos4x, the numerical value of (f + f) is (round off to three decimal places)
-
Construct a 4 x 25 design confounded in two blocks of 16 observations each. Outline the analysis of variance for this design.
-
In western Kansas, the summer density of hailstorms is estimated at about 2.1 storms per 5 square miles. In most cases, a hailstorm damages only a relatively small area in a square mile (Reference:...
-
Consider independent random samples from two populations that are normal or approximately normal, or the case in which both sample sizes are at least 30. Then, if 1 and 2 are unknown but we have...
-
The Denver Post reported that a recent audit of Los Angeles 911 calls showed that 85% were not emergencies. Suppose the 911operators in Los Angeles have just received four calls. (a) What is the...
-
A drainage ditch is to be built to carry runoff from a subdivision. The maximum design capacity is to be $1 \mathrm{million} \mathrm{gph}(\mathrm{gal} / \mathrm{h})$ and it is to be concrete lined....
-
An open drainage canal with a rectangular cross section is $3 \mathrm{~m}$ wide and $1.5 \mathrm{~m}$ deep. If the canal slopes $950 \mathrm{~mm}$ in $1 \mathrm{~km}$ of length, what is the maximum...
-
An open drainage canal is to be constructed to carry water at a maximum rate of $10^{6} \mathrm{gpm}$. The canal is concrete lined and has a rectangular cross section, with a width that is twice its...
Study smarter with the SolutionInn App