Question: Please answer the following questions in python jupyter, solve 2-G and 2-H but in order to solve them I guess you need the solve the

Please answer the following questions in python jupyter, solve 2-G and 2-H but in order to solve them I guess you need the solve the previous ones too. Import these packages before starting, import numpy as np import pandas as pd import seaborn as sns import math from sklearn import preprocessing from sklearn import datasets import sklearn from scipy import stats import matplotlib import matplotlib.pyplot as plt %matplotlib inline matplotlib.style.use('ggplot') np.random.seed(1) Q2 (70 points) Working with Data

In [2]:X = datasets.load_wine(as_frame=True)c

data = pd.DataFrame(X.data, columns=X.feature_names)
data['class'] = pd.Series(X.target)
data = data.drop(list(data.columns[5:-1]),axis=1) #Keep only the first five columns and the class label
print(" classes ",data['class'].unique()) #The different class labels in the data .. We have three class labels, 0, 1, 2
print(" class distribution ",data['class'].value_counts()) #Shows the number of rows for each class
data.info()
data.head()

Q2-A (10 points) Construct a scatter plot between the 'ash' and 'malic_acid' columns

Use the plt.scatter to plot these two variables

Use the data['class'] to color the points

Q2-B (10 points) Construct a scatter matrix between all the attributes, except the last attribute, Class

Use the sns.pairplot function to plot the scatter matrix .. use the 'class' attribute as the hue

Q2-C (5 points) Normalize the data such that each attribute has a minimum of 0 and a maximum of 1

Don't change the content of the original dataframe. The final result will be stored in data_scaled

Q2-D (5 points) Standarize the data such that each attribute has a mean 0 and a standard deviation of 1 (unit variance)

Hint: use preprocessing.StandardScaler

Don't change the content of the original dataframe. The final result will be stored in data_scaled

Q2-E Equal-Width Binning (5 points)

Convert the values in each attribute to discrete values and use 5 bins.

Use the pandas cut method, pd.cut

Q2-F Equal Frequency Binning (5 points)

Convert the values in each attribute to discrete values and use 5 bins.

Use the pandas qcut method, pd.qcut

Q2-G Sampling (15 points)

Construct three samples from the original datasets

data_sample1: select 30 random rows

data_sample2: select 10 random rows from each class, for a total of 30 rows

data_sample3: select 17% random rows from each class. Hint: use frac=0.17

In [ ]:#data_sample1 Sample 30 rows from the data

#data_sample2 Sample 10 rows from each class
 
#data_sample3 Sample 17% for each class
 
#uncomment the following three lines to check your results
#print("Sample1 Size ", len(data_sample1)," ", data_sample1.head(30))
#print(" Sample2 Size ", len(data_sample2)," ", data_sample2.head(30))
#print(" Sample3 Size ", len(data_sample3)," ",data_sample3.head(30)) 

Q2-H (15 points)

Write Python code to answer the following questions with respect to the wine data set. You can use Pandas DataFrame:

What is the correlation coefficient between 'magnesium' and 'ash' for rows with class label 2?

What is the average of the 'ash' columns for rows with class label 1?

What are the averages for all the columns for rows with class label 0? -- use mean in dataframe

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!