Question: Problem 2 : Preprocessing We will now create a pipeline to process our dataset. This processing will involve indexing and encoding the categorical features and

Problem

2

: Preprocessing

We will now create a pipeline to process our dataset. This processing will involve indexing and encoding the

categorical features and then combining all of the features into vectors.

Create lists named num

_

features and cat

_

features to store the names of columns representing

numerical and categorical features. The numerical features are age, avg

_

glucose

_

level, and bmi. All

other features are categorical.

Create lists named ix

_

features and vec

_

features to store the names of the integer

-

encoded categorical

columns and the one

-

hot encoded categorical columns

(

respectively

) .

Create a StringIndexer object that uses the columns named in the list cat

_

features to create the

columns named in the list ix

_

features.

Create a OneHotEncoder object that uses the integer

-

encoded features named in ix

_

features to create

the one

-

hot encoded categorical features named in vec

_

features. Do not drop the last columns.

Create a VectorAssembler object that combines the numerical features and the one

-

hot encoded vectors

for the categorical features. The combined column should be named features.

We will now create a pipeline from the stages above and will apply this to our data.

Create a pipeline consisting of the StringIndexer, OneHotEncoder, and VectorAssembler objects. Fit

this pipeline to the stroke

_

df DataFrame, and then apply the fitted pipeline to stroke

_

.

Store the

processed DataFrame in a variable named train.

Persist the train DataFrame. Then display the first

10

rows of the features and stroke columns of

train, setting truncate

=

False.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

The total number of points for this assignment is 120 points. Please submit your assignment in a Word file. Use this assignment file as a template to enter and copy-paste your answers for your...

1 . Introduction The primary objective of this report is to analyze a given dataset and construct predictive models to classify the data accurately. The dataset comprises various features, each...

Objective: The objective of this project is to explore, analyze, and compare the performance of at least three different machine learning classifiers or regressors on a medium - sized dataset....

Please write the codes Objective: The objective Of this project is to explore, analyze, and compare the performance Of at least three different machine learning classifiers or regressors on a...

MACHINE LEARNING Objective: The objective of this project is to explore, analyse, and compare the performance of at least three different machine learning classifiers ar regressons on a merdium -...

Please provide the codes as well Objective: The objective of this project is to explore, analyze, and compare the performance of at least three different machine learning classifiers or regressors on...

Objective: The objective of this project is to explore, analyze, and compare the performance of at least three different machine learning classifiers or regressors on a medium - sized dataset....

Under a divorce decree, Blackston was ordered to pay $600 a month in child support to his former wife. The decree provided that the payments were to be paid to the clerk of the court's office. The...

What is the tax effect of pass-through loss and income to Brian?

under U . S tax law what is the basis for the overall foreign tax credit

In the Solow growth model, why is the maximum level of capital per worker not sustainable