Question: 1. You are working on a customer dataset that has a county feature with 3006 distinct values. You decide to use 1-hot encoding for this
1. You are working on a customer dataset that has a county feature with 3006 distinct values. You decide to use 1-hot encoding for this feature. How many variables (i.e. features) do you need to represent the county feature of the original dataset? 2.
In the dataframe df, the first column is supposed to have distinct values. To verify, you use the following command:
df.iloc[:,0].value_counts().value_counts()
Explain why value_counts() is used twice and what the output would inform you. 3.
Write a function named st that takes a padnas Series object x as input and returns a tuple with three elements:
- mean of x
- standard deviation of x
- a Series object containing standardized values of x
You may use the mean() and std() methods of pandas Series. Recall the formula for standardization is (x-mu)/sigma, where mu is the mean of the feature, sigma is the standard deviation of the feature.
You only need to supply the function. Assume pandas package has been imported.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
