In this problem, we consider splitting when building a regression tree in the CART algorithm. We...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
In this problem, we consider splitting when building a regression tree in the CART algorithm. We assume that there is a feature vector X ERP and dependent variable Ye R. We have collected a training dataset (x, y),..., (En, Yn), where x R and y R for all i= 1,...,n. We also assume, for simplicity, that we are considering the initial split at the top (root node) of the tree. An arbitrary split simply divides the training dataset into a partition of size two. By appropriately reshuffling the data, we can represent this partition (again for simplicity) via two sub-datasets (x, y),..., (N, yN) and (TN+1, YN+1),..., (En, Yn) where N is the index of the last observation included in the first set. Assume throughout that our impurity function is the RSS error the standard choice for a regression tree. Please answer the following: a) (5 points) What is the total impurity value before the split? (This is the total impurity of the "null tree" or the "baseline model".) b) (5 points) What is the total impurity value after the split? (This is the total impurity of the tree with the split as defined above.) c) (10 points) Show that the total impurity value after the split is always less than or equal to the total impurity value before the split, i.e., splitting never increases the total impurity cost function. (Hint: you can use the fact that, given a sequence of real numbers 21, 22,..., Zn, the mean z = 1 is the minimizer of the function RSS(z) = 1(zi - z).) n In this problem, we consider splitting when building a regression tree in the CART algorithm. We assume that there is a feature vector X ERP and dependent variable Ye R. We have collected a training dataset (x, y),..., (En, Yn), where x R and y R for all i= 1,...,n. We also assume, for simplicity, that we are considering the initial split at the top (root node) of the tree. An arbitrary split simply divides the training dataset into a partition of size two. By appropriately reshuffling the data, we can represent this partition (again for simplicity) via two sub-datasets (x, y),..., (N, yN) and (TN+1, YN+1),..., (En, Yn) where N is the index of the last observation included in the first set. Assume throughout that our impurity function is the RSS error the standard choice for a regression tree. Please answer the following: a) (5 points) What is the total impurity value before the split? (This is the total impurity of the "null tree" or the "baseline model".) b) (5 points) What is the total impurity value after the split? (This is the total impurity of the tree with the split as defined above.) c) (10 points) Show that the total impurity value after the split is always less than or equal to the total impurity value before the split, i.e., splitting never increases the total impurity cost function. (Hint: you can use the fact that, given a sequence of real numbers 21, 22,..., Zn, the mean z = 1 is the minimizer of the function RSS(z) = 1(zi - z).) n
Expert Answer:
Related Book For
Business Intelligence And Analytics Systems For Decision Support
ISBN: 9781292009209
10th Global Edition
Authors: Efraim Turban, Ramesh Sharda, Dursun Delen, Pearson Education Limited, Dennis G. Zill
Posted Date:
Students also viewed these programming questions
-
3. Please explain in detail each step and coding characters of below Python code. (10 points) #Declare variables to store the budget amount, # amount spent, difference, and total. budget = 0.0...
-
Calculate the internal rate of return on this investment An investment project has the following cash flows: _________________________________________________________________________ year 0...
-
Part of the auditor's unmodified (also known as clean) opinion states that the "financial statements present fairly the financial position, results of operations, and cash flows of the company"....
-
Ratio Computation and Analysis; Liquidity) as loan analyst for Madison Bank, you have been presented the following information. Each of these companies has requested a loan of $50,000 for 6 months...
-
Helium, in a large tank at 100C and 400 kPa, discharges to a receiver through a converging-diverging nozzle designed to exit at Ma = 2.5 with exit area 1.2 cm2. Compute (a) The receiver pressure and...
-
On December 31,2022, Belt Enterprises must measure its impairment loss for plant and equipment. Belt has determined that the broadcast license is not impaired. The projected future undiscounted cash...
-
A transistor has a constant failure rate of 0.005 per ten thousand hours. (a) What is the probability that it will perform satisfactorily for at least 75,000 hours? (b) What is the 20,000-hour...
-
In the late 1990s, many grocery supermarkets shifted from regular storewide sales to issuing membership in discount and points programs, much like frequent flyer programs run by airlines. A...
-
After being business for 8 years, a customer gets scalded when a server accidentally spills hot espresso on them. The customer successfully sues your BizCafe for $200,000. How would this be handled...
-
From the adjustments columns in Exercise 5-9A, journalize the four adjusting entries, as of December 31, in proper general journal format. Exercise 5-9A Jim Jacobs Furniture Repair Work Sheet...
-
L is a list of integers representing a set of navigable (by canoe) lakes. P is an array of length |L|, where for each u L, P[u] is a linked list containing pairs (v, x) where v represents a lake in...
-
COMPARING PRICES You are the owner of a music store. You have decided to advertise on local radio and television. Your budget is limited , but you want favorable reach and frequency.Contact local...
-
XYZ Inc. produces a machine that washes and dries your laundry in a single, compact unit that doesn't require existing washer/dryer hookups or venting. The machine is popular with RV & boat owners...
-
This is a subjective question, hence you have to write your answer in the TextField given below 022804474 -9 Marks 3/10/08-20 86242-2023/1 "In the context of online shopping recommendations, if a...
-
A rod of proper length L points along the x axis but moves in a direction making an angle of 45 to this axis (see the figure). A platform, also parallel to the axis, lies in the rod's way, but a slit...
-
A couple has just purchased a home for $447,500.00. They will pay 20% down in cash, and finance the remaining balance. The mortgage broker has gotten them a mortgage rate of 5.52% APR with monthly...
-
Last month, Dow Chemical analyzed a project with an initial cash outflow of $1 million, and expected cash inflows of $440,000 per year in years 1, 2, and 3. However, before the decision to accept or...
-
Proposals have been made to ?sail? spacecraft to the outer solar system using the pressure of sunlight, or even to propel interstellar spacecraft with high-powered, Earth-based lasers. Sailing...
-
Investigate via a Web search how models and their solutions are used by the U.S. Department of Homeland Security in the war against terrorism. Also investigate how other governments or government...
-
Define collaborative hub.
-
Business analytics and computerized data processing support managers and decision making. Keeping current business environment challenges in mind, along with Mintzbergs 10 managerial roles (see Table...
-
What are the differences among an onsite team, a virtual team, a task force, and a committee? What are some of the potential differences in dynamics between people in these different groups?
-
Compare and contrast disciplinary, interdisciplinary, and crossfunctional teams.
-
What are some of the unique challenges associated with teamwork in health care? How do you see teamwork fitting in with the accountable care organization (ACO) mandates? Describe three benefits and...
Study smarter with the SolutionInn App