Question: In this assignment, we will practice loading datasets from Pandas, visualization, and writing processing pipelines. We will work with the auto - mpg data in

In this assignment, we will practice loading datasets from Pandas, visualization, and writing processing pipelines. We will work with the auto-mpg data in which each row carries information for a car model. The data has the following columns:
1. mpg: mile per gallon rate of the car
2. cylinders: number of cylinders the car has
3. displacement: displacement of the car
4. horsepower: horsepower of the car
5. weight: weight of the car
6. acceleration: acceleration of the car
7. model year: the year the car was introduced
8. origin: the development location of the car (number codes represent Asia, Europe, and North America)
9. car name: model name of the car (unique for each car model)
Please use the auto-mpg.csv data file to do the following:
1. Load the data into a Python session as a Pandas DataFrame. Check if all columns are in correct type, and fix any incorrect ones if necessary
2. Split the data 75% training and 25% testing
3. Visualize necessary columns in the data. After this point, you should have three lists:
Columns that are numeric and have symmetric distributions
Columns that are numeric and have skewed distributions
Columns that are categorical
4. Build a pipeline as follows:
Numeric and symmetric columns: Imputation Standardization
Numeric and skewed columns: Imputation Log transformation Standardization
Categorical columns: One hot encoder
5. Train the pipeline on the training data. Then perform transformation on the training data and testing data.
6. Print the shape of the processed training data and processed testing data.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!