Question: import numpy as np from collections import Counter from sklearn import datasets, model _ selection # No other libraries will be imported # load the

import numpy as np

from collections import Counter

from sklearn import datasets, model

_

selection

# No other libraries will be imported

# load the Iris Dataset, which contains

150

samples.

# each sample has

4

features.

# the dataset contains

3

classes of

50

instances each, where each class refers to a type of iris plant.

iris

=

datasets.load

_

iris

()

=

.

array

(

iris

.

data

)

# features, numeric attributes.

[

Sepal length, Sepal Width, Petal length, Petal width

]

=

.

array

(

iris

.

target

)

# labels: class

- 0,

class

- 1,

class

- 2

_

train, X

_

test, Y

_

train, Y

_

test

=

model

_

selection.train

_

test

_

split

(

,

,

test

_

size

= 0.25,

random

_

state

= 0)

("

Train Shape:", X

_

train.shape

)

("

Train Shape:", X

_

test.shape

)

3 .

Calculate Information Gain for each attribute

(

numeric

),

and show the feature that should be used first when build a decision tree.

step

- 1

: find the best cutpoint for each attribute.

(

find value to split the data

)

step

- 2

: calculate the information gain for each attribute.

(

decide the order of attributes when build DT

)

- - - - - - - - - - - - - - - - - - - -

Some helper functions

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# calculate Entropy for a given distribution H

(

) .

def entropy

(

probabilities: list

) - >

float:

return sum

([-

*

.

log

2 (

)

for p in probabilities if p

> 0])

# given a list of labels, return the probability for each class P

(

) .

def class

_

probabilities

(

labels: list

) - >

list:

total

_

count

=

len

(

labels

)

return

[

label

_

count

/

total

_

count for label

_

count in Counter

(

labels

) .

values

()]

# calculate the Entropy H

(

)

for a given list of labels.

def data

_

entropy

(

labels: list

) - >

float:

return entropy

(

class

_

probabilities

(

labels

))

# split data into two sub

-

groups

[

group

1,

goup

2]

based on attribute

[

feature

_

idx

]

and value

[

feature

_

val

]

# if sample

[

feature

_

idx

] <

feature

_

val:

# group

1 < -

sample

# else:

# group

2 < -

sample

def split

_

data

(

data: np

.

array, feature

_

idx: int, feature

_

val: float

) - >

tuple:

mask

_

below

_

threshold

=

data

[

,

feature

_

idx

] <

feature

_

val

group

1 =

data

[

mask

_

below

_

threshold

]

group

2 =

data

[

~mask

_

below

_

threshold

]

return group

1,

group

2

# calculate the entropy for current partition. H

(

|

=

feature

_

val

)

def partition

_

entropy

(

1_

labels: list, g

2_

labels:list

) - >

float:

total

_

count

=

len

(

1_

labels

) +

len

(

2_

labels

)

#weighted combination of conditional entropy in both group

1

and group

2 .

return data

_

entropy

(

1_

labels

) * (

len

(

1_

labels

) /

total

_

count

) +

data

_

entropy

(

2_

labels

) * (

len

(

2_

labels

) /

total

_

count

)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Examples to use the Helper functions

- - - - - - - - - - - - - - - - - - - - - - -

# calculate the H

(

)

for the train and test data:

(

data

_

entropy

(

_

train

))

(

data

_

entropy

(

_

test

))

## to split the data based on feature

_

idx and feature

_

val:

train

_

data

=

.

concatenate

((

_

train, np

.

reshape

(

_

train,

(- 1, 1))),

axis

= 1)

# concatenate

[

_

train, Y

_

train

]

(

train

_

data.shape

)

# split the data into two subgroups

1,

2 =

split

_

data

(

train

_

data, feature

_

idx

= 1,

feature

_

val

= 3)

(

1 .

shape

)

(

2 .

shape

)

# calculate the weighted entropy for the current split.

(

partition

_

entropy

(

1 [

, - 1],

2 [

, - 1]))

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Your implementation

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# Initialize variables to store the best cutpoint and information gain for each attribute

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Printing

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

#print the calculated cutpoint

[

feature

_

val

]

and information gain for each attribute.

# print the feature should be used first when build decision tree.

Please help me complete this code. The dataset is part of the question. Iris dataset is already embeded in python

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

import numpy as np from collections import Counter from sklearn import datasets, model _ selection # No other libraries will be imported # load the Iris Dataset, which contains 1 5 0 samples. # each...

INSTRUCTIONS This assignment testsyour ability to create a simple regression solution to a prediction problem. You will use a dataset of bike rentals fromCapital Bikeshare system, Washington D.C.,...

1- Use the following libraries. import numpy as np from sklearn import datasets from sklearn. neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split 2- Load Iris...

{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "ICE5_NLP", "provenance": [], "collapsed_sections": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" } },...

# Import packages and functions import numpy as np import pandas as pd from sklearn import metrics from sklearn.model _ selection import train _ test _ split from sklearn.neural _ network import...

The nbaallelo _ log . csv file contains data on 1 2 6 3 1 4 NBA games from 1 9 4 7 to 2 0 1 5 . The dataset includes the features pts , elo _ i , win _ equiv, and game _ result. Using a sample of the...

Using the Scikit-Learn Dataset To load the sample scikit data set, import the datasets module and load the desired dataset. Code Run: from sklearn import datasets import pandas as pd diabetes =...

Jupiter Notebook We have covered some of the limitations of single layer neural networks in class, but they are still powerful learning systems that provide a good way to begin learning about how to...

Training Linear Regression Models Q4) Training a Linear Regression Model. We will now train a linear regression model of the sales data to make useful predictions. Work through the steps below and...

5. (i) Use PIE to prove that n (-1)^ (^) (n k)" = n! k=0 (ii) Evaluate the following when either 0 k n (-1 (7) ik i=0

For a cheetah, 70.0% of the energy expended during exertion is internal work done on the cheetah's system and is dissipated within his body; for a dog only 5.00% of the energy expended is dissipated...

At what tax rate will an investor be indifferent between a 4 . 2 percent municipal bond and a 7 percent corporate bun 6 0 percent 6 7 percent 3 3 percent 4 0 percent

Please show me how to solve a problem like this: 18) Lululemon uses a markup strategy to set their retail prices. Their most popular legging, the Wunder Under costs them $45 to manufacture. They have...

From a Comparable Worth Standpoint, what is the situation with regard to Federal Gender-based Employee Pay Equity?

Provide an example of how drilling down further into information can yield new results.

What do Dimensions represent in OLAP Cubes?