Question: Problem 2 : Preprocessing We will now create a pipeline to process our dataset. This processing will involve indexing and encoding the categorical features and

Problem 2: Preprocessing
We will now create a pipeline to process our dataset. This processing will involve indexing and encoding the
categorical features and then combining all of the features into vectors.
Create lists named num_features and cat_features to store the names of columns representing
numerical and categorical features. The numerical features are age, avg_glucose_level, and bmi. All
other features are categorical.
Create lists named ix_features and vec_features to store the names of the integer-encoded categorical
columns and the one-hot encoded categorical columns (respectively).
Create a StringIndexer object that uses the columns named in the list cat_features to create the
columns named in the list ix_features.
Create a OneHotEncoder object that uses the integer-encoded features named in ix_features to create
the one-hot encoded categorical features named in vec_features. Do not drop the last columns.
Create a VectorAssembler object that combines the numerical features and the one-hot encoded vectors
for the categorical features. The combined column should be named features.
We will now create a pipeline from the stages above and will apply this to our data.
Create a pipeline consisting of the StringIndexer, OneHotEncoder, and VectorAssembler objects. Fit
this pipeline to the stroke_df DataFrame, and then apply the fitted pipeline to stroke_df. Store the
processed DataFrame in a variable named train.
Persist the train DataFrame. Then display the first 10 rows of the features and stroke columns of
train, setting truncate=False.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!