1. An excerpt of the insurance purchase dataset is given (see last page and the accompanying...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
1. An excerpt of the insurance purchase dataset is given (see last page and the accompanying spreadsheet). A few values in BANK_FUND was modified to facilitate calculation in the assignment. There are 40 records. All the calculations should show the details. You have known that the gain of a split is defined as follows: Gain Impurity (parent) -Impurity(i) n i=1 The impurity measure can be Gini index, entropy, and classification error rate. In splitting, maximizing the gain is equivalent to minimizing the weighted average of the impurity measure of the child nodes. A. Gini index i. Calculate the Gini index for the dataset excerpt without any partitioning. ii. Calculate the gain in Gini index for CUSTOMER_ID using multi-way split. iii. Calculate the gain in Gini index for STATE using multi-way split. iv. Calculate the gain in Gini index for REGION using multi-way split. v. Calculate the gain in Gini index for MARITAL_STATUS using multi-way split. vi. Calculate the gain in Gini index for LTV_BIN using multi-way split. vii. Calculate the gain in Gini index for BANK_FUND for every possible split. Which split is best? viii. Which attribute is the best for splitting based on the gain in Gini index? ix. Compute a 2-level decision tree using the gain in Gini index. B. Entropy i. Calculate the entropy for the dataset excerpt without any partitioning. ii. Calculate the information gain for CUSTOMER_ID using multi-way split. iii. Calculate the information gain for STATE using multi-way split. iv. Calculate the information gain for REGION using multi-way split. v. Calculate the information gain for MARITAL_STATUS using multi-way split. vi. Calculate the information gain for LTV_BIN using multi-way split. vii. Calculate the information gain for BANK_FUND for every possible split. Which split is best? viii. Which attribute is the best for splitting based on information gain? ix. Compute a 2-level decision tree using information gain. C. Classification error rate i. Calculate the error rate without any partitioning. ii. Calculate the gain in error rate for CUSTOMER_ID using multi-way split iii. Calculate the gain in error rate for STATE using multi-way split. iv. Calculate the gain in error rate for REGION using multi-way split. V. Calculate the gain in error rate for MARITAL_STATUS using multi-way split. vi. Calculate the gain in error rate for LTV_BIN using multi-way split. vii. viii. ix. Calculate the gain in error rate for BANK_FUND for every possible split. Which split is best? Which attribute is the best for splitting based on the gain in error rate? Compute a 2-level decision tree using the gain in error rate. CUSTOME STATE REGION MARITAL STATUS LTV BIN BANK_FUNDS BUY INSURANCE CU9823 CA West SINGLE MEDIUM 0 No CU14284 NY NorthEast SINGLE HIGH 0 No CU5938 CA West SINGLE MEDIUM 0 No CU1069 MN West SINGLE MEDIUM 0 No CU11717 NY NorthEast DIVORCED HIGH 0 No CU5928 NY NorthEast DIVORCED HIGH 0 No CU10012 NM Southwest MARRIED HIGH 0 No CU197 MI Midwest SINGLE HIGH 0 No CU476 CA West MARRIED HIGH 0 No CU9110 DC NorthEast DIVORCED HIGH 0 No CU14921 NY NorthEast SINGLE HIGH 340 Yes CU12175 MI Midwest MARRIED MEDIUM 500 No CU12658 CA West SINGLE MEDIUM 500 Yes CU14620 UT Southwest MARRIED HIGH 500 No CU15186 NY NorthEast MARRIED HIGH 600 No CU7924 NY NorthEast SINGLE MEDIUM 650 No CU7148 OK Midwest MARRIED HIGH 750 No CU14052 CA West MARRIED HIGH 750 Yes CU7502 MI Midwest SINGLE LOW 750 Yes CU14911 CA West MARRIED HIGH 750 Yes CU15786 WI Midwest SINGLE HIGH 750 Yes CU8318 NY NorthEast DIVORCED HIGH 1500 Yes CU12738 NY NorthEast DIVORCED HIGH 1500 Yes CU13543 WA West DIVORCED VERY HIGH 1500 No CU5165 MI Midwest MARRIED MEDIUM 2400 No CU5082 CA West MARRIED HIGH 2400 Yes CU4679 CA West DIVORCED HIGH 3000 Yes CU13803 NY NorthEast MARRIED VERY HIGH 3000 Yes CU8340 NY NorthEast DIVORCED HIGH 4000 Yes CU3214 NY NorthEast DIVORCED HIGH 4500 Yes CU2394 NY NorthEast DIVORCED HIGH 4500 Yes CU1691 NY NorthEast DIVORCED MEDIUM 4500 Yes CU7291 MI Midwest SINGLE MEDIUM 5000 No CU3296 CA West WIDOWED MEDIUM 10000 No CU3654 NY NorthEast WIDOWED HIGH 10000 Yes CU5675 NY NorthEast SINGLE LOW 10000 Yes CU4285 NY NorthEast MARRIED MEDIUM 10000 Yes CU8589 WI Midwest WIDOWED HIGH 16000 No CU9004 WI Midwest MARRIED MEDIUM 16000 Yes CU2399 MI Midwest MARRIED HIGH 20000 Yes NY NorthEast DIVORCEDMEDIUM Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast SINGLE HIGH Yes NY NorthEast SINGLE LOW Yes MI Midwest MARRIED HIGH Yes CA West SINGLE MEDIUM Yes WI Midwest MARRIED MEDIUM Yes NY NorthEast MARRIED VERY HIGHYES NY NorthEast DIVORCEDHIGH Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast WIDOWEDHIGH Yes CA West MARRIED HIGH Yes MI Midwest SINGLE LOW Yes NY NorthEast MARRIED MEDIUM Yes CA West DIVORCEDHIGH Yes CA West MARRIED HIGH Yes CA West MARRIED HIGH Yes NY NorthEast DIVORCEDHIGH Yes WI Midwest SINGLE HIGH Yes 1. An excerpt of the insurance purchase dataset is given (see last page and the accompanying spreadsheet). A few values in BANK_FUND was modified to facilitate calculation in the assignment. There are 40 records. All the calculations should show the details. You have known that the gain of a split is defined as follows: Gain Impurity (parent) -Impurity(i) n i=1 The impurity measure can be Gini index, entropy, and classification error rate. In splitting, maximizing the gain is equivalent to minimizing the weighted average of the impurity measure of the child nodes. A. Gini index i. Calculate the Gini index for the dataset excerpt without any partitioning. ii. Calculate the gain in Gini index for CUSTOMER_ID using multi-way split. iii. Calculate the gain in Gini index for STATE using multi-way split. iv. Calculate the gain in Gini index for REGION using multi-way split. v. Calculate the gain in Gini index for MARITAL_STATUS using multi-way split. vi. Calculate the gain in Gini index for LTV_BIN using multi-way split. vii. Calculate the gain in Gini index for BANK_FUND for every possible split. Which split is best? viii. Which attribute is the best for splitting based on the gain in Gini index? ix. Compute a 2-level decision tree using the gain in Gini index. B. Entropy i. Calculate the entropy for the dataset excerpt without any partitioning. ii. Calculate the information gain for CUSTOMER_ID using multi-way split. iii. Calculate the information gain for STATE using multi-way split. iv. Calculate the information gain for REGION using multi-way split. v. Calculate the information gain for MARITAL_STATUS using multi-way split. vi. Calculate the information gain for LTV_BIN using multi-way split. vii. Calculate the information gain for BANK_FUND for every possible split. Which split is best? viii. Which attribute is the best for splitting based on information gain? ix. Compute a 2-level decision tree using information gain. C. Classification error rate i. Calculate the error rate without any partitioning. ii. Calculate the gain in error rate for CUSTOMER_ID using multi-way split iii. Calculate the gain in error rate for STATE using multi-way split. iv. Calculate the gain in error rate for REGION using multi-way split. V. Calculate the gain in error rate for MARITAL_STATUS using multi-way split. vi. Calculate the gain in error rate for LTV_BIN using multi-way split. vii. viii. ix. Calculate the gain in error rate for BANK_FUND for every possible split. Which split is best? Which attribute is the best for splitting based on the gain in error rate? Compute a 2-level decision tree using the gain in error rate. CUSTOME STATE REGION MARITAL STATUS LTV BIN BANK_FUNDS BUY INSURANCE CU9823 CA West SINGLE MEDIUM 0 No CU14284 NY NorthEast SINGLE HIGH 0 No CU5938 CA West SINGLE MEDIUM 0 No CU1069 MN West SINGLE MEDIUM 0 No CU11717 NY NorthEast DIVORCED HIGH 0 No CU5928 NY NorthEast DIVORCED HIGH 0 No CU10012 NM Southwest MARRIED HIGH 0 No CU197 MI Midwest SINGLE HIGH 0 No CU476 CA West MARRIED HIGH 0 No CU9110 DC NorthEast DIVORCED HIGH 0 No CU14921 NY NorthEast SINGLE HIGH 340 Yes CU12175 MI Midwest MARRIED MEDIUM 500 No CU12658 CA West SINGLE MEDIUM 500 Yes CU14620 UT Southwest MARRIED HIGH 500 No CU15186 NY NorthEast MARRIED HIGH 600 No CU7924 NY NorthEast SINGLE MEDIUM 650 No CU7148 OK Midwest MARRIED HIGH 750 No CU14052 CA West MARRIED HIGH 750 Yes CU7502 MI Midwest SINGLE LOW 750 Yes CU14911 CA West MARRIED HIGH 750 Yes CU15786 WI Midwest SINGLE HIGH 750 Yes CU8318 NY NorthEast DIVORCED HIGH 1500 Yes CU12738 NY NorthEast DIVORCED HIGH 1500 Yes CU13543 WA West DIVORCED VERY HIGH 1500 No CU5165 MI Midwest MARRIED MEDIUM 2400 No CU5082 CA West MARRIED HIGH 2400 Yes CU4679 CA West DIVORCED HIGH 3000 Yes CU13803 NY NorthEast MARRIED VERY HIGH 3000 Yes CU8340 NY NorthEast DIVORCED HIGH 4000 Yes CU3214 NY NorthEast DIVORCED HIGH 4500 Yes CU2394 NY NorthEast DIVORCED HIGH 4500 Yes CU1691 NY NorthEast DIVORCED MEDIUM 4500 Yes CU7291 MI Midwest SINGLE MEDIUM 5000 No CU3296 CA West WIDOWED MEDIUM 10000 No CU3654 NY NorthEast WIDOWED HIGH 10000 Yes CU5675 NY NorthEast SINGLE LOW 10000 Yes CU4285 NY NorthEast MARRIED MEDIUM 10000 Yes CU8589 WI Midwest WIDOWED HIGH 16000 No CU9004 WI Midwest MARRIED MEDIUM 16000 Yes CU2399 MI Midwest MARRIED HIGH 20000 Yes NY NorthEast DIVORCEDMEDIUM Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast SINGLE HIGH Yes NY NorthEast SINGLE LOW Yes MI Midwest MARRIED HIGH Yes CA West SINGLE MEDIUM Yes WI Midwest MARRIED MEDIUM Yes NY NorthEast MARRIED VERY HIGHYES NY NorthEast DIVORCEDHIGH Yes NY NorthEast DIVORCEDHIGH Yes NY NorthEast WIDOWEDHIGH Yes CA West MARRIED HIGH Yes MI Midwest SINGLE LOW Yes NY NorthEast MARRIED MEDIUM Yes CA West DIVORCEDHIGH Yes CA West MARRIED HIGH Yes CA West MARRIED HIGH Yes NY NorthEast DIVORCEDHIGH Yes WI Midwest SINGLE HIGH Yes
Expert Answer:
Answer rating: 100% (QA)
Gini Index Lets proceed with calculating the Gini index and gain in Gini index for each of the specified attributes Well start with part i i Calculate the Gini index for the dataset excerpt without an... View the full answer
Related Book For
South-Western Federal Taxation 2020 Comprehensive
ISBN: 9780357109144
43rd Edition
Authors: David M. Maloney, William A. Raabe, James C. Young, Annette Nellen, William H. Hoffman
Posted Date:
Students also viewed these programming questions
-
Read the case study "Southwest Airlines," found in Part 2 of your textbook. Review the "Guide to Case Analysis" found on pp. CA1 - CA11 of your textbook. (This guide follows the last case in the...
-
The following additional information is available for the Dr. Ivan and Irene Incisor family from Chapters 1-5. Ivan's grandfather died and left a portfolio of municipal bonds. In 2012, they pay Ivan...
-
In Exercises 7683, use a graphing utility to graph the function. Use the graph to determine whether the function has an inverse that is a function (that is, whether the function is one-to-one). f(x)...
-
Derive the Greens method solution for the response caused by an arbitrary forcing function. Consider the function to consist of a series of step functions-that is, start from Equation 3.105 rather...
-
Yearly automobile inspections are required for residents of the state of Pennsylvania. Suppose that 18% of all inspected cars in Pennsylvania have problems that need to be corrected. Unfortunately,...
-
Redesign the VOCs adsorber of Example 9.15 for a breakthrough time of \(4.0 \mathrm{~h}\). The pressure drop through the bed [calculated using the Ergun equation (2-95)] should not exceed \(1.0...
-
Clampett Oil purchases crude oil products from suppliers in Texas (TX), Oklahoma (OK), Pennsylvania (PA), and Alabama (AL), from which it refines four end-products: gasoline, kerosene, heating oil,...
-
The storied American aerospace company, McDonnell Douglas, designed a wide-bodied airliner in the mid-1980s called the MD-11. Research and development costs, plus other fixed costs related to simply...
-
Consider the December transactions for Crystal Clear Cleaning that were presented in Chapter 5. Crystal Clear uses the perpetual inventory system. Dec. 2 Purchased 475 units of inventory for $2,850...
-
2. Determine the word bond graph for a rear-drive vehicle (see textbook), basic components of the drivetrain are the engine, transmission, driveshaft, differential, and wheels. The wheels convert...
-
discuss the emerging field of enzyme engineering and design, including rational design strategies, directed evolution techniques, and computational modeling approaches, for the development of novel...
-
How does/will Financial Functions assist you in your upcoming courses or future goals? Write a paragraph explaining to fellow classmates how these functions will be utilized.
-
In what ways do evolutionary processes shape the biodiversity gradients observed across latitudinal and elevational gradients, and how might these patterns be altered under future climate scenarios?
-
Assume you work for a large U.S. tech company and that you are currently interviewing applicants for a sales position. Your manager tells you that the employee you hire needs to be younger than 29...
-
Make up a price which will differentiate it in the market from the same products and mention two proofs why do you think this product will be successful?
-
On January 1, 2017, Crawford Corporation issued five-year, 2% bonds payable with a face value of $2,700,000. The bonds were issued at 88 and pay interest on January 1 and July 1. Crawford amortizes...
-
What tools are available to help shoppers compare prices, features, and values and check other shoppers opinions?
-
Assume that a partnership is profitable and that its tax year ends on December 31 but one of the partners' tax year ends on September 30. Does the partner enjoy a tax benefit or detriment from the...
-
For your state and one of its neighbors. find the following income tax rules. Place your data in a chart, and e-mail your findings to your instructor. a. To what extent does each state follow the...
-
What are the similarities between the crop method used for farming and the completed contract method used for long-term construction?
-
Using the data in Tables 7.13 and 7.14, does this farm qualify for debt (i.e., is the farm profitable, liquid, and solvent)? Table 7.13 Table 7.14 Item 2006 2007 2008 2009 2010 2011 2012 Gross cash...
-
The book basis of depreciable assets for Erwin Co. is 900,000 and the tax basis is 700,000 at the end of 2015. The enacted tax rate is 34% for all periods. Determine the amount of deferred taxes to...
-
How does an asset gain or loss develop in pension accounting?
Study smarter with the SolutionInn App