Question: In this assignment, we will practice loading datasets from Pandas, visualization, and writing processing pipelines. We will work with the auto - mpg data in
In this assignment, we will practice loading datasets from Pandas, visualization, and writing processing pipelines. We will work with the autompg data in which each row carries information for a car model. The data has the following columns:
mpg: mile per gallon rate of the car
cylinders: number of cylinders the car has
displacement: displacement of the car
horsepower: horsepower of the car
weight: weight of the car
acceleration: acceleration of the car
model year: the year the car was introduced
origin: the development location of the car number codes represent Asia, Europe, and North America
car name: model name of the car unique for each car model
Please use the autompgcsv data file to do the following:
Load the data into a Python session as a Pandas DataFrame. Check if all columns are in correct type, and fix any incorrect ones if necessary
Split the data training and testing
Visualize necessary columns in the data. After this point, you should have three lists:
Columns that are numeric and have symmetric distributions
Columns that are numeric and have skewed distributions
Columns that are categorical
Build a pipeline as follows:
Numeric and symmetric columns: Imputation Standardization
Numeric and skewed columns: Imputation Log transformation Standardization
Categorical columns: One hot encoder
Train the pipeline on the training data. Then perform transformation on the training data and testing data.
Print the shape of the processed training data and processed testing data.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
