Question: import numpy as np from collections import Counter from sklearn import datasets, model _ selection # No other libraries will be imported # load the
import numpy as np
from collections import Counter
from sklearn import datasets, modelselection
# No other libraries will be imported
# load the Iris Dataset, which contains samples.
# each sample has features.
# the dataset contains classes of instances each, where each class refers to a type of iris plant.
iris datasets.loadiris
X nparrayirisdata # features, numeric attributes. Sepal length, Sepal Width, Petal length, Petal width
Y nparrayiristarget # labels: class class class
Xtrain, Xtest, Ytrain, Ytest modelselection.traintestsplitX Y testsize randomstate
printTrain Shape:", Xtrain.shape
printTrain Shape:", Xtest.shape
Calculate Information Gain for each attribute numeric and show the feature that should be used first when build a decision tree.
step: find the best cutpoint for each attribute. find value to split the data
step: calculate the information gain for each attribute. decide the order of attributes when build DT
# Some helper functions
# calculate Entropy for a given distribution HX
def entropyprobabilities: list float:
return sump nplogp for p in probabilities if p
# given a list of labels, return the probability for each class PY
def classprobabilitieslabels: list list:
totalcount lenlabels
return labelcount totalcount for labelcount in Counterlabelsvalues
# calculate the Entropy HY for a given list of labels.
def dataentropylabels: list float:
return entropyclassprobabilitieslabels
# split data into two subgroups group goup based on attribute featureidx and value featureval
# if samplefeatureidx featureval:
# group sample
# else:
# group sample
def splitdatadata: nparray, featureidx: int, featureval: float tuple:
maskbelowthreshold data: featureidx featureval
group datamaskbelowthreshold
group data~maskbelowthreshold
return group group
# calculate the entropy for current partition. HYXfeatureval
def partitionentropyglabels: list, glabels:list float:
totalcount lenglabels lenglabels
#weighted combination of conditional entropy in both group and group
return dataentropyglabelslenglabelstotalcount dataentropyglabelslenglabelstotalcount
#
# Examples to use the Helper functions
# calculate the HY for the train and test data:
printdataentropyYtrain
printdataentropyYtest
## to split the data based on featureidx and featureval:
traindata npconcatenateXtrain, npreshapeYtrain, axis # concatenate Xtrain, Ytrain
printtraindata.shape
# split the data into two subgroups
g g splitdatatraindata, featureidx featureval
printgshape
printgshape
# calculate the weighted entropy for the current split.
printpartitionentropyg: g:
#
# Your implementation
# Initialize variables to store the best cutpoint and information gain for each attribute
#
# Printing
#print the calculated cutpoint featureval and information gain for each attribute.
# print the feature should be used first when build decision tree.
Please help me complete this code. The dataset is part of the question. Iris dataset is already embeded in python
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
