Question: Information Gain Exercise The information gain is based on the decrease in entropy after a dataset is split on an attribute (i.e. variable). Constructing a

Information Gain Exercise The information gain is based on the decrease

Information Gain Exercise The information gain is based on the decrease in entropy after a dataset is split on an attribute (i.e. variable). Constructing a decision tree is all about finding an attribute that returns the highest information gain (i.e., the most homogeneous branches). To determine which attribute should be used to split the data, we need to calculate the information gain. Answer the following questions below: Entire population/ all customers (140 cases) Make a Purchase Total No Yes 90 50 | 140 1. What is the entropy of the parent node (i.e. entire population)? Attribute 1: Customers' marital status Total No Make a Purchase Yes 60 30 20 Married Single 30 80 60 140 Attribute 2: Customers' income level Total Make a Purchase Yes 50 High Medium Low 40 40 | 20 50 140 2. What are the entropies of the child nodes? 3. What are the information gains? 4. In conclusion, which variable (marital status or income level) is more informative (and should be used to split the data)? Information Gain Exercise The information gain is based on the decrease in entropy after a dataset is split on an attribute (i.e. variable). Constructing a decision tree is all about finding an attribute that returns the highest information gain (i.e., the most homogeneous branches). To determine which attribute should be used to split the data, we need to calculate the information gain. Answer the following questions below: Entire population/ all customers (140 cases) Make a Purchase Total No Yes 90 50 | 140 1. What is the entropy of the parent node (i.e. entire population)? Attribute 1: Customers' marital status Total No Make a Purchase Yes 60 30 20 Married Single 30 80 60 140 Attribute 2: Customers' income level Total Make a Purchase Yes 50 High Medium Low 40 40 | 20 50 140 2. What are the entropies of the child nodes? 3. What are the information gains? 4. In conclusion, which variable (marital status or income level) is more informative (and should be used to split the data)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!

Two new strains of corona-viruses named Omicron(O), Delta(D) are detected and scientists want to discriminate between them based on the symptoms the patients display including smell {Y,N}, taste {...

Question 1 a) information gain is based on the decrease in entropy after splitting a dataset based on an attribute. the meaning of constructing a decision tree is all about finding the attributes...

Class 1 point Instance Attribute 1 Attribute 3 Attribute 2 1 2 4 5 6 F B 9 Figure 1: Training Dataset Review the table labeled Figure 1: Training Dataset. Assume that we want to use a deasion tree...

Answer the questions (e) and (f) only using Python code. 1. Decision trees As part of this question you will implement and compare the Information Gain, Gini Index and CART evaluation measures for...

Use Python code to answer the question (g) only. 1. Decision trees As part of this question you will implement and compare the Information Gain, Gini Index and CART evaluation measures for splits in...

1. An excerpt of the insurance purchase dataset is given (see last page and the accompanying spreadsheet). A few values in BANK_FUND was modified to facilitate calculation in the assignment. There...

Instance Attribute 1 Attribute 2 Attribute 3 Class 1 point 1 T T Y 2 T T 3 F N 4 5 F 6 8 9 Figure 1: Training Dataset Review the table labeled Figure 1: Training Dataset. Assume that we want to use a...

import numpy as np from collections import Counter from sklearn import datasets, model _ selection # No other libraries will be imported # load the Iris Dataset, which contains 1 5 0 samples. # each...

Please solve this ERROR!!!!!!! KeyError Traceback ( most recent call last ) in ( ) 9 0 9 1 # Build the decision tree - - - > 9 2 decision _ tree = build _ decision _ tree ( df ) 9 3 9 4 # Print the...

2. Instance Attribute 1 1 po Attribute 2 Class Attribute 3 1 T T 2 3 T N 4 F 5 E 6 F 7 F 8 N 8 9 Figure 1: Training Dataset Review the table labeled Figure 1: Training Dataset. Assume that we want to...

Let X, Y, Z have joint density fX,Y,Z (x,y,z) = 6, for 0

An Ideal gas intially at (P1,V1) is expanded to (P2,V2) and then compressed adiabatically to same volume V1 and Pressure P3. If W is the net work done by the gas in complete process, which of the...

How many years would it take an investment of $ 4 7 6 to grow to $ 1 1 , 0 9 8 at an annual rate of return of 4 . 5 2 % ?

Draw a class diagram and its relationship based on the following statements. You can use any UML tool/software to represent your class diagram. Do NOT copy and paste your diagram, use the attachment...