Question: USE JUPYTER LAB, below is the provided code and at the end are the questions: import numpy as np import pandas as pd import seaborn

USE JUPYTER LAB, below is the provided code and at the end are the questions:

import numpy as np

import pandas as pd

import seaborn as sns

import math

from sklearn import preprocessing

from sklearn import datasets

from sklearn.tree import plot

_

tree

from sklearn.tree import export

_

text

from sklearn.tree import DecisionTreeClassifier

from sklearn import metrics #Import scikit

-

learn metrics module for accuracy calculation

from sklearn.metrics import confusion

_

matrix, ConfusionMatrixDisplay

from sklearn.model

_

selection import train

_

test

_

split

from sklearn.datasets import make

_

moons

from sklearn.ensemble import RandomForestClassifier

import sklearn

from scipy import stats

import matplotlib

import matplotlib.pyplot as plt

%

matplotlib inline

matplotlib.style.use

('

ggplot

')

.

random.seed

(1)

- - - - - - - - - -

Loading the digits dataset

(

classification

) .

The digists dataset has

1797

records

Each record is a

8

8

image

(64

dimensions

)

and there are

10

class labels for this dataset

Each image

(

record

)

is labeled by the number it represents

The intensities of the original pixels are binned to values ranging from

0

16

- - - - - - - - - - -

from sklearn.datasets import load

_

digits

=

load

_

digits

()

=

digits.data

=

digits.target

- - - -

#Learning about how the data is stored

("

_

shape", X

.

shape,

"

_

shape",y

.

shape

)

(

[0

2,])

#Check the values for the first two images

(

[0

2])

#Print the class labels for the first two images

- - - - -

#Show the first image

plt

.

gray

()

plt

.

matshow

(

[0,

] .

reshape

(8, 8))

#show the first image, first reshape the

64

values vector into an

8

8

matrix

plt

.

show

()

- - - - - -

#plotting the first

8

images

fig

,

axes

=

plt

.

subplots

(

nrows

= 2,

ncols

= 4,

figsize

= (6, 3))

for id

,

ax in enumerate

(

axes

.

flatten

())

image

=

[

,

] .

reshape

(8, 8)

.

set

_

axis

_

off

()

#ax

.

imshow

(

image

,

cmap

=

plt

.

.

gray

_

)

#You can try this and comment the line below

.

imshow

(

image

,

cmap

=

'gray'

)

.

set

_

title

("

Label:

%

" %

[

],

fontsize

= 9)

plt

.

tight

_

layout

()

plt

.

show

()

- - - - - - - -

# Split data into

70 %

train and

30 %

test subsets

_

train, X

_

test, y

_

train, y

_

test

=

train

_

test

_

split

(

,

,

test

_

size

= 0.3,

shuffle

=

True, random

_

state

= 42)

_

train, X

_

test, y

_

train, y

_

test

=

train

_

test

_

split

(

,

,

test

_

size

= 0.3,

shuffle

=

False

)

("

Training Data",X

_

train.shape

)

("

Testing Data",X

_

test.shape

)

counts, bins

=

.

histogram

(

_

test

)

("

Number of records in each class", counts

)

plt

.

stairs

(

counts

,

bins

)

- - - - - - -

1 -

A Train a decision tree on the training data and report the training and testing accuracy of the decision tree.

1 -

B Plot the first

8

images in the testing datasets.

The title of each subfigure should be True: label Predicted: label

1 -

C Plot the first

8

images in the testing datasets that were misclassified.

The title of each subfigure should be True: label Predicted: label

1 -

D Print the classification report using classification

_

report from metrics in sklearn

1 -

E Plot the confusion matrix using ConfusionMatrixDisplay

1 -

(5

points

)

Plot the decision tree using plot

_

tree

1 -

G Cross Validation

Report the accuracies for the

5 -

fold cross validation

(

use cv

= 5) .

The cross validation method takes the decision tree model, the entire dataset, and the class labels.

For this line:

(" % 0.2

f accuracy with a standard deviation of

% 0.2

" % (

scores

.

mean

(),

scores.std

()))

this is a sample output

[0.80833333 0.71944444 0.79665738 0.82729805 0.79108635]

0.79

accuracy with a standard deviation of

0.04

1 -

H Random Forest Classifier

Train a random forest on X

_

train and report the accuracy on X

_

test

Use

100

trees in the random forest classifier. Recall that number of records in X

_

train

(1257)

Fine

-

tune the max

_

samples

(

try different numbers

)

for RandomForestClassifier

to achieve an accuracy higher than

91 % (

a big improvement from the

78 %)

- - - - - - - - - - - - - - - - - - - - -

2

Finding the best split using gini index

data

=

.

array

([[1, 2, 3, 1], [2, 3, 3, 0], [3, 2, 2, 1], [2, 2, 6, 1], [1, 2, 5, 1], [1, 3, 2, 0], [2, 3, 6, 0], [3, 3, 4, 1]])

("

Values

",

data

[

,

- 1])

("

Class Label",data

[

, - 1])

=

data.shape

[0]

=

data.shape

[1] - 1

#number of columns, ignore the last column

(

class label

)

- - - - - - - - -

2 -

(10

points

)

Write a function that computes the gini

_

index of a dataset D

Use math.power

(

_

positive

/

, 2)

to calculate

(

_

positive

/

)^2

If the data has zero records, the gini

_

index is zero The last column of the dataset is the class label

#Write a function that computes the gini

_

index for a dataset

1 - ((

_

positive

/

)^2 + (

_

negative

/

)^2)

#use math.power

(

_

positive

/

, 2)

to calculate

(

_

positive

/

)^2

#If the data has zero records, the gini

_

index is zero

#The last column of the dataset is the class label

def get

_

gini

_

index

(

)

=

.

shape

[0]

gini

_

index

=

"calculate it

"

#Write your code here

return

(

gini

_

index

)

(

get

_

gini

_

index

(

data

))

#You should get

0.46875

- - - - -

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

For this project you will be developing a regression model based on car crash data that is already in seaborn. You can import it using the following code: import numpy as np import pandas as pd...

Python, Import the following packages first: import numpy as np import pandas as pd import seaborn as sns import math from sklearn import preprocessing from sklearn import datasets import sklearn...

Hello, I would like some python troubleshooting help. Below I have a python code that I constructed on Jupyter Notebooks. This code will run fine one time, but if I immediately run it a second time I...

Using these packages in python jupyter, import numpy as np import pandas as pd import seaborn as sns import math from sklearn import preprocessing from sklearn import datasets import sklearn from...

This is my homework and I am stuck in the middle. I have to write code in COLABORATORY using Python. The topic is Data Analysis and Machine Learning. I got some instructions and data for grades of...

I am working on project two for MAT243 Applied Statistics.It is working in Codio in Jupyter notebook.I got all my codes to work except step 5.Can you help me withthe code?I keep getting syntax...

Data Science, Python, Jupyter Notebook I have a term project for my Capstone class in Data Science. Below is the syllabus, dataset, and the Jupiter Notebook. I am creating a Classification model to...

I am working on project one for MAT243 Applied Statistics. It is working in Codio in Jupyter notebook. I got all my codes to work except step 9. Can you help me withthe code? I keep getting syntax...

The management of Richmond State Bank has asked you to examine the interest rate risk of the bank. Management is concerned that interest rates will increase by the end of the year and wants to see...

Toyota Motor Corporation uses target costing. Assume that Toyota marketing personnel estimate that the competitive selling price for the Camry in the upcoming model year will need to be $22,000....

The Capital Asset Pricing Model ( CAPM ) is mathematically expressed as: E ( r ) = r + ( r - r ) In estimating the Expected Return " E ( r ) " for a security, CAPM assumes away such considerations as...

Pick up an incorrect statement from the following in a simple harmonic motion A . The velocity is maximum at its mean position B . The velocity is minimum at the end of the stroke C . The...

What are Measures in OLAP Cubes?

How do OLAP Databases provide for Drilling Down into data?

How are OLAP Cubes different from Production Relational Databases?