Question: Python, Import the following packages first: import numpy as np import pandas as pd import seaborn as sns import math from sklearn import preprocessing from

Python, Import the following packages first:

import numpy as np import pandas as pd import seaborn as sns import math from sklearn import preprocessing from sklearn import datasets import sklearn from scipy import stats import matplotlib import matplotlib.pyplot as plt %matplotlib inline matplotlib.style.use('ggplot') np.random.seed(1)

That's the question

Q2 (20 points) Working with Data

In [ ]:X = datasets.load_wine(as_frame=True)

data = pd.DataFrame(X.data, columns=X.feature_names)
data['class'] = pd.Series(X.target)
data = data.drop(list(data.columns[5:-1]),axis=1) #Keep only the first five columns and the class label
print(" classes ",data['class'].unique()) #The different class labels in the data .. We have three class labels, 0, 1, 2
print(" class distribution ",data['class'].value_counts()) #Shows the number of rows for each class
data.info()
data.head()

Q2-A Construct a scatter plot between the 'ash' and 'malic_acid' columns

Use the plt.scatter to plot these two variables

Use the data['class'] to color the points

In [ ]:#Type your answer

Q2-B (10 points) Construct a scatter matrix between all the attributes, except the last attribute, Class

Use the sns.pairplot function to plot the scatter matrix .. use the 'class' attribute as the hue

In [ ]:#Type your answer

Q- Normalize the data such that each attribute has a minimum of 0 and a maximum of 1

Don't change the content of the original dataframe. The final result will be stored in data_scaled

In [ ]:#Normalizing all the columns .. Accessing the columns with the columns' names

# from sklearn.preprocessing import StandardScaler
# from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(data)
data_scaled = pd.DataFrame(data_scaled, columns=data.columns)
data_scaled = data.copy()

Q-B Standarize the data such that each attribute has a mean 0 and a standard deviation of 1 (unit variance)

Hint: use preprocessing.StandardScaler

Don't change the content of the original dataframe. The final result will be stored in data_scaled

In [ ]:#Standarizing all the columns .. Accessing the columns with the columns' names

data_scaled = data.copy()
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)
data_scaled = pd.DataFrame(data_scaled, columns=data.columns) 

Discretization

Q-C Equal-Width Binning

Convert the values in each attribute to discrete values and use 5 bins.

Use the pandas cut method, pd.cut

In [ ]:data_discrete = data.copy

for column in data_discrete.columns[:-1]:
 data_discrete[column] = pd.cut(data_discrete[column], bins=5, labels=False)
 
Q-D Equal Frequency Binning 

Convert the values in each attribute to discrete values and use 5 bins.

Use the pandas qcut method, pd.qcut

In [ ]:data_freq = data.copy()

for column in data_freq.columns[:-1]:
 data_freq[column] = pd.qcut(data_freq[column], q=5, labels=False)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!