In this problem, we consider splitting when building a regression tree in the CART algorithm. We...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
In this problem, we consider splitting when building a regression tree in the CART algorithm. We assume that there is a feature vector X RP and dependent variable Ye R. We have collected a training dataset (x, y),..., (In, Yn), where x R and y; E R for all i = 1, ..., n. We also assume, for simplicity, that we are considering the initial split at the top (root node) of the tree. An arbitrary split simply divides the training dataset into a partition of size two. By appropriately reshuffling the data, we can represent this partition (again for simplicity) via two sub-datasets (x1, y),..., (TN, YN) and (TN+1, YN+1),..., (En, Yn) where N is the index of the last observation included in the first set. Assume throughout that our impurity function is the RSS error the standard choice for a regression tree. e) (10 points) Consider a modification of the regression tree algorithm such that, in addition to considering splits of the form described in the paragraph preceding part (d), we also consider splits of the form R(j,l,t) = {X : XjX < t} and R(j,l,t) = {X : XjX t} where j and e are the indices of two chosen features and t is a cutoff value for XjXe. Is it possible for these new splits to improve the regression tree? Explain. In this problem, we consider splitting when building a regression tree in the CART algorithm. We assume that there is a feature vector X RP and dependent variable Ye R. We have collected a training dataset (x, y),..., (In, Yn), where x R and y; E R for all i = 1, ..., n. We also assume, for simplicity, that we are considering the initial split at the top (root node) of the tree. An arbitrary split simply divides the training dataset into a partition of size two. By appropriately reshuffling the data, we can represent this partition (again for simplicity) via two sub-datasets (x1, y),..., (TN, YN) and (TN+1, YN+1),..., (En, Yn) where N is the index of the last observation included in the first set. Assume throughout that our impurity function is the RSS error the standard choice for a regression tree. e) (10 points) Consider a modification of the regression tree algorithm such that, in addition to considering splits of the form described in the paragraph preceding part (d), we also consider splits of the form R(j,l,t) = {X : XjX < t} and R(j,l,t) = {X : XjX t} where j and e are the indices of two chosen features and t is a cutoff value for XjXe. Is it possible for these new splits to improve the regression tree? Explain.
Expert Answer:
Answer rating: 100% (QA)
In the context of regression trees and the CART Classification and Regression Trees algorithm the primary goal is to find optimal splits that minimize ... View the full answer
Related Book For
Business Intelligence And Analytics Systems For Decision Support
ISBN: 9781292009209
10th Global Edition
Authors: Efraim Turban, Ramesh Sharda, Dursun Delen, Pearson Education Limited, Dennis G. Zill
Posted Date:
Students also viewed these programming questions
-
3. Please explain in detail each step and coding characters of below Python code. (10 points) #Declare variables to store the budget amount, # amount spent, difference, and total. budget = 0.0...
-
Calculate the internal rate of return on this investment An investment project has the following cash flows: _________________________________________________________________________ year 0...
-
Part of the auditor's unmodified (also known as clean) opinion states that the "financial statements present fairly the financial position, results of operations, and cash flows of the company"....
-
Computer Technologies provides maintenance service for computers and office equipment for companies throughout the Northeast. The sales manager is elated because she closed a $300,000 three-year...
-
Raj Bhatt, Business Development Manager for GE Canada had recently met with executives from GE Supply, a U.S.-based distribution arm of GE Industrial, to discuss new business opportunities in Energy...
-
Earthquake From Data Set 16 in Appendix B we see that an earthquake had a measurement of 0.70 on the Richter scale. Determine whether the given values are from a discrete or continuous data set. Data...
-
Review the generalizations that Gwynn Nettler provides about fraud perpetrators.
-
Margaret Black's family owns five parcels of farmland broken into a southeast sector, north sector, northwest sector, west sector, and southwest sector. Margaret is involved primarily in growing...
-
??(Bond valuation) A bond that matures in 17 years has a ?$1,000par value. The annual coupon interest rate is 13 percent and the?market's required yield to maturity on a? comparable-risk b 2 answers
-
Company A is a global company based in the United States that operates in the financial industry. Company A serves its customers with financial products, such as checking accounts, bank cards, and...
-
Two bonds A and B have the same credit rating of 7%, Coupon rate of 8 % and par value of $ 3000 . Bond A has 30 years to maturity and bond B has five (5) years to maturity. Please demonstrate your...
-
1-bromobutane will undergo reactions when heated, as shown by reactions A and B. CH3CH2CH2CH2Br A B CH3CH2CH2CH2OH CH3CH2CH=CH2
-
In one hundred words or more, what is a push strategy? What is the difference between a push strategy and a pull strategy? Why would you use one versus the other? Give an example of marketing that...
-
Raza`s Generators is considering two kinds of Generators, a 20 KVA, 30 KVA and 50KVA in this year`s capital budget. The projects are independent. The initial investment for the 20 KVA systems is...
-
Given a circle with center (-2,1) and radius 29 what is the slope of the tangent line to the circle that passes through the point (3,3)? Round to two decimal places if necessary.
-
Analytically calculate the maximum angular displacement (throw) of the ram for the rock-crushing mechanism shown in Figure P4.13. Crushing ram 180 mm 360 mm Crank 420 mm 60 mm 1 75 FIGURE P4.13...
-
Supposing that all of the remaining 300 shares were liquidated in December 200B at P100 per share because of bankruptcy of X Corporation. What would be the capital loss in December 200B
-
1. What is the semi-annually compounded interest rate if $200 accumulates to $318.77 in eight years? Answer in percentage with two decimal places. 2. What is the quarterly compounded interest rate if...
-
What is PageRank algorithm? What is the relation between PageRank and citation analysis? How does Google use PageRank?
-
What is a spreadsheet add-in? How can add-ins help in DSS creation and use?
-
The terrorist attack on the World Trade Center on September 11, 2001, underlined the importance of open source intelligence. The USA PATRIOT Act and the creation of the U.S. Department of Homeland...
-
A clerk works in a wholesale business that also operates several retail outlets. Thus, cash sales are common. This clerk is responsible for tallying the day's cash receipts and preparing the daily...
-
What is meant by "more than remote?"
-
Forensic accounting is governed by the materiality concept. Analyze this statement.
Study smarter with the SolutionInn App