Question: Can you please be detailed on code of each step. Thank you You will start by examining the data in the dataset. To get the

Can you please be detailed on code of each step. Thank you
You will start by examining the data in the dataset.
To get the most out of this lab, read the instructions and code before you run the cells. Take time to experiment!
Start by importing the pandas package and setting some default display options.
import pandas as pdpd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
Next, load the dataset into a pandas DataFrame.
The data doesn't contain a header, so you will define those column names in a variable that's named col_names to the attributes listed in the dataset description.
url = "imports-85.csv"
col_names=['symboling','normalized-losses','fuel-type','aspiration','num-of-doors','body-style','drive-wheels','engine-location','wheel-base',
'length','width','height','curb-weight','engine-type','num-of-cylinders','engine-size',
'fuel-system','bore','stroke','compression-ratio','horsepower','peak-rpm','city-mpg','highway-mpg','price']
df_car = pd.read_csv(url,sep=',',names = col_names ,na_values="?", header=None)
First, to see the number of rows (instances) and columns (features), you will use shape.
df_car.shape
Next, examine the data by using the head method.
df_car.head(5)
There are 25 columns. Some of the columns have numerical values, but many of them contain text.
To display information about the columns, use the info method.
df_car.info()
To make it easier to view the dataset when you start encoding, drop the columns that you won't use.
df_car.columns
df_car = df_car[[ 'aspiration', 'num-of-doors', 'drive-wheels', 'num-of-cylinders']].copy()
You now have four columns. These columns all contain text values.
df_car.head()
Most machine learning algorithms require inputs that are numerical values.
The num-of-cylinders and num-of-doors features have an ordinal value. You could convert the values of these features into their numerical counterparts.
However, aspiration and drive-wheels don't have an ordinal value. These features must be converted differently.
You will explore the ordinal features first.
In this step
Start by getting the new column types from the DataFrame:
df_car.info()
First, determine what values the ordinal columns contain.
Starting with the num-of-doors feature, you can use value_counts to discover the values.
df_car['num-of-doors'].value_counts()
This feature only has two values: four and two. You can create a simple mapper that contains a dictionary:
door_mapper ={"two": 2,
"four": 4}
You can then use the replace method from pandas to generate a new numerical column based on the num-of-doors column.
df_car['doors']= df_car["num-of-doors"].replace(door_mapper)
When you display the DataFrame, you should see the new column on the right. It contains a numerical representation of the number of doors.
df_car.head()
Repeat the process with the num-of-cylinders column.
First, get the values.
df_car['num-of-cylinders'].value_counts()
Next, create the mapper.
cylinder_mapper ={"two":2,
"three":3,
"four":4,
"five":5,
"six":6,
"eight":8,
"twelve":12}
Apply the mapper by using the replace method.
df_car['cylinders']= df_car['num-of-cylinders'].replace(cylinder_mapper)
df_car.head()
For more information about the replace method, see pandas.DataFrame.replace in the pandas documentation.
In this step, you will encode non-ordinal data by using the get_dummies method from pandas.
The two remaining features are not ordinal.
According to the attribute description, the following values are possible:
aspiration: std, turbo.
drive-wheels: 4wd, fwd, rwd.
You might think that the correct strategy is to convert these values into numerical values. For example, consider the drive-wheels feature. You could use 4wd =1, fwd =2, and rwd =3. However, fwd isn't less
than rwd. These values don't have an order, but you just introduced an order to them by assigning these numerical values.
The correct strategy is to convert these values into binary features for each value in the original feature. This process is often called one-hot encoding in machine learning, or dummying in statistics.
pandas provides a get_dummies method, which converts the data into binary features. For more information, see pandas.get_dummies in the pandas documentation.
According to the attribute description, drive-wheels has three possible values.
df_car['drive-wheels'].value_counts()
Use the get_dummies method to add new binary features to the DataFrame.
df_car = pd.get_dummies(df_car,columns=['drive-wheels'])
df_car.head()
When you examine the dataset, you should see three new columns on the right:
drive-wheels_4wd
drive-wheels_fwd
drive-wheels_rwd

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!