Preprocessing Pipeline Define preprocessing for numerical and categorical features numeric features ' age ' , 'trestbps', 'chol', 'thalch', 'oldpeak' categorical features ' sex ' , ' cp ' , ' fbs ' , 'restecg', 'slope', ' ca ' , 'thal' Numerical pipeline numeric pipeline Pipeline ( ( ' imputer ' , SimpleImputer ( strategy 'median' ) ) , ( ' scaler ' , StandardScaler ( ) ) ) Categorical pipeline categorical pipeline Pipeline ( ( ' imputer ' , SimpleImputer ( strategy 'most frequent' ) ) , ( ' onehot ' , OneHotEncoder ( drop 'first' ) ) ) Combine pipelines into a full preprocessing pipeline preprocessor ColumnTransformer ( ( ' num ' , numeric pipeline, numeric features ) , ( ' cat ' , categorical pipeline, categorical features ) ) Split the Data Split data into training and test sets X train, X test, y train, y test train test split ( X , y , test size 0 2 , random state 4 2 ) Create and train model log reg Pipeline ( ( ' preprocessor ' , preprocessor ) , ( ' classifier ' , LogisticRegression ( max iter 1 0 0 0 ) ) ) log reg fit ( X train, y train ) y pred log reg predict ( X test ) Evaluation print ( Logistic Regression ) print ( Accuracy , accuracy score ( y test, y pred ) ) print ( Classification Report , classification report ( y test, y pred ) ) print ( Confusion Matrix , confusion matrix ( y test, y pred ) ) Create and train model decision tree Pipeline ( ( ' preprocessor ' , preprocessor ) , ( ' classifier ' , DecisionTreeClassifier ( ) ) ) decision tree fit ( X train, y train ) y pred decision tree predict ( X test ) Evaluation print ( Decision Tree ) print ( Accuracy , accuracy score ( y test, y pred ) ) print ( Classification Report , classification report ( y test, y pred ) ) print ( Confusion Matrix , confusion matrix ( y test, y pred ) ) Random Forest In 4 2 Create and train model random forest Pipeline ( ( ' preprocessor ' , preprocessor ) , ( ' classifier ' , RandomForestClassifier ( ) ) ) random forest fit ( X train, y train ) y pred random forest predict ( X test ) Evaluation print ( Random Forest ) print ( Accuracy , accuracy score ( y test, y pred ) ) print ( Classification Report , classification report ( y test, y pred ) ) print ( Confusion Matrix , confusion matrix ( y test, y pred ) ) Support Vector Machine ( SVM ) In 4 3 Create and train model svm Pipeline ( ( ' preprocessor ' , preprocessor ) , ( ' classifier ' , SVC ( probability True ) ) ) svm fit ( X train, y train ) y pred svm predict ( X test ) Evaluation print ( Support Vector Machine ) print ( Accuracy , accuracy score ( y test, y pred ) ) print ( Classification Report , classification report ( y test, y pred ) ) print ( Confusion Matrix , confusion matrix ( y test, y pred ) ) K Nearest Neighbors ( KNN ) In 4 4 Create and train model knn Pipeline ( ( ' preprocessor ' , preprocessor ) , ( ' classifier ' , KNeighborsClassifier ( ) ) ) knn fit ( X train, y train ) y pred knn predict ( X test ) Evaluation print ( K Nearest Neighbors ) print ( Accuracy , accuracy score ( y test, y pred ) ) print ( Classification Report , classification report ( y test, y pred ) ) print ( Confusion Matrix , confusion matrix ( y test, y pred ) ) Hyperparameter Tuning for Each Model Logistic Regression In 4 7 Define hyperparameters for Logistic Regression param grid log reg 'classifier C ' 0 1 , 1 , 1 0 , 'classifier solver' ' liblinear ' , 'saga' Set up GridSearchCV grid log reg GridSearchCV ( Pipeline ( ( ' preprocessor ' , preprocessor ) , ( ' classifier ' , LogisticRegression ( max iter 1 0 0 0 ) ) ) , param grid log reg, cv 5 , scoring 'accuracy' ) Fit GridSearchCV grid log reg fit ( X train, y train ) Best parameters and score print ( Best Parameters for Logistic Regression , grid log reg best params ) print ( Best Score for Logistic Regression , grid log reg best score ) Evaluate on the test set y pred grid log reg predict ( X test ) print ( Logistic Regression Test Accuracy , accuracy score ( y test, y pred ) ) print ( Classification Report , classification report ( y test, y pred ) ) print ( Confusion Matrix , confusion matrix ( y test, y pred ) ) Decision Tree In 4 8 Define hyperparameters for Decision Tree param grid dec tree 'classifier max depth' None , 1 0 , 2 0 , 3 0 , 'classifier min samples split' 2 , 5 , 1 0 , 'classifier criterion' ' gini ' , 'entropy' Set up GridSearchCV grid dec tree GridSearchCV ( Pipeline ( ( ' preprocessor ' , preprocessor ) , ( ' classifier ' , DecisionTreeClassifier ( ) ) ) , param grid dec tree, cv 5 , scoring 'accuracy' ) Fit GridSearchCV grid dec tree fit ( X train, y train ) Best parameters and score print ( Best Parameters for Decision Tree , grid dec tree best params ) print ( Best Score for Decision Tree , grid dec tree best score ) Explain this code

The Answer is in the image, click to view ...

Question: # # Preprocessing Pipeline # Define preprocessing for numerical and categorical features numeric _ features = [ ' age ' , 'trestbps', 'chol', 'thalch', 'oldpeak'

# # Preprocessing Pipeline

# Define preprocessing for numerical and categorical features

numeric

_

features

= ['

age

',

'trestbps', 'chol', 'thalch', 'oldpeak'

]

categorical

_

features

= ['

sex

','

','

fbs

',

'restecg', 'slope',

'

',

'thal'

]

# Numerical pipeline

numeric

_

pipeline

=

Pipeline

([

('

imputer

',

SimpleImputer

(

strategy

=

'median'

)),

('

scaler

',

StandardScaler

())

])

# Categorical pipeline

categorical

_

pipeline

=

Pipeline

([

('

imputer

',

SimpleImputer

(

strategy

=

'most

_

frequent'

)),

('

onehot

',

OneHotEncoder

(

drop

=

'first'

))

])

# Combine pipelines into a full preprocessing pipeline

preprocessor

=

ColumnTransformer

([

('

num

',

numeric

_

pipeline, numeric

_

features

),

('

cat

',

categorical

_

pipeline, categorical

_

features

)

])

# # Split the Data

# Split data into training and test sets

_

train, X

_

test, y

_

train, y

_

test

=

train

_

test

_

split

(

,

,

test

_

size

= 0.2,

random

_

state

= 42)

# Create and train model

log

_

reg

=

Pipeline

([

('

preprocessor

',

preprocessor

),

('

classifier

',

LogisticRegression

(

max

_

iter

= 1000))

])

log

_

reg.fit

(

_

train, y

_

train

)

_

pred

=

log

_

reg.predict

(

_

test

)

# Evaluation

("

Logistic Regression"

)

("

Accuracy:

",

accuracy

_

score

(

_

test, y

_

pred

))

("

Classification Report:

",

classification

_

report

(

_

test, y

_

pred

))

("

Confusion Matrix:

",

confusion

_

matrix

(

_

test, y

_

pred

))

# Create and train model

decision

_

tree

=

Pipeline

([

('

preprocessor

',

preprocessor

),

('

classifier

',

DecisionTreeClassifier

())

])

decision

_

tree.fit

(

_

train, y

_

train

)

_

pred

=

decision

_

tree.predict

(

_

test

)

# Evaluation

("

Decision Tree"

)

("

Accuracy:

",

accuracy

_

score

(

_

test, y

_

pred

))

("

Classification Report:

",

classification

_

report

(

_

test, y

_

pred

))

("

Confusion Matrix:

",

confusion

_

matrix

(

_

test, y

_

pred

))

# # Random Forest

# In

[42]

# Create and train model

random

_

forest

=

Pipeline

([

('

preprocessor

',

preprocessor

),

('

classifier

',

RandomForestClassifier

())

])

random

_

forest.fit

(

_

train, y

_

train

)

_

pred

=

random

_

forest.predict

(

_

test

)

# Evaluation

("

Random Forest"

)

("

Accuracy:

",

accuracy

_

score

(

_

test, y

_

pred

))

("

Classification Report:

",

classification

_

report

(

_

test, y

_

pred

))

("

Confusion Matrix:

",

confusion

_

matrix

(

_

test, y

_

pred

))

# # Support Vector Machine

(

SVM

)

# In

[43]

# Create and train model

svm

=

Pipeline

([

('

preprocessor

',

preprocessor

),

('

classifier

',

SVC

(

probability

=

True

))

])

svm

.

fit

(

_

train, y

_

train

)

_

pred

=

svm

.

predict

(

_

test

)

# Evaluation

("

Support Vector Machine"

)

("

Accuracy:

",

accuracy

_

score

(

_

test, y

_

pred

))

("

Classification Report:

",

classification

_

report

(

_

test, y

_

pred

))

("

Confusion Matrix:

",

confusion

_

matrix

(

_

test, y

_

pred

))

# # K

-

Nearest Neighbors

(

KNN

)

# In

[44]

# Create and train model

knn

=

Pipeline

([

('

preprocessor

',

preprocessor

),

('

classifier

',

KNeighborsClassifier

())

])

knn

.

fit

(

_

train, y

_

train

)

_

pred

=

knn

.

predict

(

_

test

)

# Evaluation

("

-

Nearest Neighbors"

)

("

Accuracy:

",

accuracy

_

score

(

_

test, y

_

pred

))

("

Classification Report:

",

classification

_

report

(

_

test, y

_

pred

))

("

Confusion Matrix:

",

confusion

_

matrix

(

_

test, y

_

pred

))

# # Hyperparameter Tuning for Each Model

# # Logistic Regression

# In

[47]

# Define hyperparameters for Logistic Regression

param

_

grid

_

log

_

reg

= {

'classifier

__

'

[0.1, 1, 10],

'classifier

__

solver':

['

liblinear

',

'saga'

]

}

# Set up GridSearchCV

grid

_

log

_

reg

=

GridSearchCV

(

Pipeline

([

('

preprocessor

',

preprocessor

),

('

classifier

',

LogisticRegression

(

max

_

iter

= 1000))

]),

param

_

grid

_

log

_

reg, cv

= 5,

scoring

=

'accuracy'

)

# Fit GridSearchCV

grid

_

log

_

reg.fit

(

_

train, y

_

train

)

# Best parameters and score

("

Best Parameters for Logistic Regression:", grid

_

log

_

reg.best

_

params

_)

("

Best Score for Logistic Regression:", grid

_

log

_

reg.best

_

score

_)

# Evaluate on the test set

_

pred

=

grid

_

log

_

reg.predict

(

_

test

)

("

Logistic Regression Test Accuracy:", accuracy

_

score

(

_

test, y

_

pred

))

("

Classification Report:

",

classification

_

report

(

_

test, y

_

pred

))

("

Confusion Matrix:

",

confusion

_

matrix

(

_

test, y

_

pred

))

# # Decision Tree

# In

[48]

# Define hyperparameters for Decision Tree

param

_

grid

_

dec

_

tree

= {

'classifier

__

max

_

depth':

[

None

, 10, 20, 30],

'classifier

__

min

_

samples

_

split':

[2, 5, 10],

'classifier

__

criterion':

['

gini

',

'entropy'

]

}

# Set up GridSearchCV

grid

_

dec

_

tree

=

GridSearchCV

(

Pipeline

([

('

preprocessor

',

preprocessor

),

('

classifier

',

DecisionTreeClassifier

())

]),

param

_

grid

_

dec

_

tree, cv

= 5,

scoring

=

'accuracy'

)

# Fit GridSearchCV

grid

_

dec

_

tree.fit

(

_

train, y

_

train

)

# Best parameters and score

("

Best Parameters for Decision Tree:", grid

_

dec

_

tree.best

_

params

_)

("

Best Score for Decision Tree:", grid

_

dec

_

tree.best

_

score

_)

Explain this code

?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

User import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.compose import ColumnTransformer from...

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.model _ selection import train _ test _ split, GridSearchCV from sklearn.pipeline import...

I ' m working on a survival analysis project using CT scans with a CNN model. After loading DICOM files for one patient and converting to Hounsfield Units ( HU ) , I get a 1 3 4 x 5 1 2 x 5 1 2...

example of dataset preprocessing using the Boston Housing dataset. Exercise 1 Python Code # Import necessary librariesimport pandas as pdimport numpy as npfrom sklearn.preprocessing import...

The implementation utilized a comprehensive weather forecasting dataset from Kaggle containing daily meteorological observations with multiple features including temperature, humidity, atmospheric...

1 Create a block diagram to represent the workflow of an NLP data preprocessing pipeline. The pipeline should include the following stages: Tokenization, Stopwords Removal, Stemming, and...

The total number of points for this assignment is 120 points. Please submit your assignment in a Word file. Use this assignment file as a template to enter and copy-paste your answers for your...

The classic MIPS 5-stage pipeline is depicted below. instruction decode and execute memory write fetch register fetch access back (i) With reference to the 5-stage pipeline, what are data hazards and...

Assume that you are working as analyst for swig, a food delivery chain. The group has collected some interesting characteristics of customers who had purchased their food earlier. (Refer the attached...

Strong cultures can have powerful effects on employee behavior. How does this create inadvertent control mechanisms? That is, are strong cultures an ethical way to control behavior?

Discuss the areas of conflict of interest in performance management activities. (8 marks) (b) Explain the reasons for introducing organizational communication programs in the context of human...

1 8 The compensation associated with restricted stock under a stock awaed plen is: Mutiple Choice The book value of a share of similar shock times the number of shases. Alocaled to expense over the...

Alfonso and Baldomero entered into a contract of sale involving a parcel of land. They executed a private document. When Baldomero tried to register the title of the land in his name, he was told that