Question: Can you please be detailed on code of each step. Thank you You will start by examining the data in the dataset. To get the
Can you please be detailed on code of each step. Thank you
You will start by examining the data in the dataset.
To get the most out of this lab, read the instructions and code before you run the cells. Take time to experiment!
Start by importing the pandas package and setting some default display options.
import pandas as pdpdsetoptiondisplaymaxrows',
pdsetoptiondisplaymaxcolumns',
pdsetoptiondisplaywidth',
Next, load the dataset into a pandas DataFrame.
The data doesn't contain a header, so you will define those column names in a variable that's named colnames to the attributes listed in the dataset description.
url "importscsv
colnamessymboling'normalizedlosses','fueltype','aspiration','numofdoors','bodystyle','drivewheels','enginelocation','wheelbase',
'length','width','height','curbweight','enginetype','numofcylinders','enginesize',
'fuelsystem','bore','stroke','compressionratio','horsepower','peakrpm'citympg'highwaympg'price'
dfcar pdreadcsvurlsepnames colnames navalues headerNone
First, to see the number of rows instances and columns features you will use shape.
dfcar.shape
Next, examine the data by using the head method.
dfcar.head
There are columns. Some of the columns have numerical values, but many of them contain text.
To display information about the columns, use the info method.
dfcar.info
To make it easier to view the dataset when you start encoding, drop the columns that you won't use.
dfcar.columns
dfcar dfcar 'aspiration', 'numofdoors', 'drivewheels', 'numofcylinders'copy
You now have four columns. These columns all contain text values.
dfcar.head
Most machine learning algorithms require inputs that are numerical values.
The numofcylinders and numofdoors features have an ordinal value. You could convert the values of these features into their numerical counterparts.
However, aspiration and drivewheels don't have an ordinal value. These features must be converted differently.
You will explore the ordinal features first.
In this step
Start by getting the new column types from the DataFrame:
dfcar.info
First, determine what values the ordinal columns contain.
Starting with the numofdoors feature, you can use valuecounts to discover the values.
dfcarnumofdoors'valuecounts
This feature only has two values: four and two. You can create a simple mapper that contains a dictionary:
doormapper two:
"four":
You can then use the replace method from pandas to generate a new numerical column based on the numofdoors column.
dfcardoors dfcarnumofdoors"replacedoormapper
When you display the DataFrame, you should see the new column on the right. It contains a numerical representation of the number of doors.
dfcar.head
Repeat the process with the numofcylinders column.
First, get the values.
dfcarnumofcylinders'valuecounts
Next, create the mapper.
cylindermapper two:
"three":
"four":
"five":
"six":
"eight":
"twelve":
Apply the mapper by using the replace method.
dfcarcylinders dfcarnumofcylinders'replacecylindermapper
dfcar.head
For more information about the replace method, see pandas.DataFrame.replace in the pandas documentation.
In this step, you will encode nonordinal data by using the getdummies method from pandas.
The two remaining features are not ordinal.
According to the attribute description, the following values are possible:
aspiration: std turbo.
drivewheels: wd fwd rwd
You might think that the correct strategy is to convert these values into numerical values. For example, consider the drivewheels feature. You could use wd fwd and rwd However, fwd isn't less
than rwd These values don't have an order, but you just introduced an order to them by assigning these numerical values.
The correct strategy is to convert these values into binary features for each value in the original feature. This process is often called onehot encoding in machine learning, or dummying in statistics.
pandas provides a getdummies method, which converts the data into binary features. For more information, see pandas.getdummies in the pandas documentation.
According to the attribute description, drivewheels has three possible values.
dfcardrivewheels'valuecounts
Use the getdummies method to add new binary features to the DataFrame.
dfcar pdgetdummiesdfcar,columnsdrivewheels'
dfcar.head
When you examine the dataset, you should see three new columns on the right:
drivewheelswd
drivewheelsfwd
drivewheelsrwd
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
