Question: Python, Import the following packages first: import numpy as np import pandas as pd import seaborn as sns import math from sklearn import preprocessing from

Python, Import the following packages first:

import numpy as np import pandas as pd import seaborn as sns import math from sklearn import preprocessing from sklearn import datasets import sklearn from scipy import stats import matplotlib import matplotlib.pyplot as plt %matplotlib inline matplotlib.style.use('ggplot') np.random.seed(1)

That's the question

Q2 (20 points) Working with Data

In [ ]:X = datasets.load_wine(as_frame=True)

data = pd.DataFrame(X.data, columns=X.feature_names)

data['class'] = pd.Series(X.target)

data = data.drop(list(data.columns[5:-1]),axis=1) #Keep only the first five columns and the class label

print(" classes ",data['class'].unique()) #The different class labels in the data .. We have three class labels, 0, 1, 2

print(" class distribution ",data['class'].value_counts()) #Shows the number of rows for each class

data.info()

data.head()

Q2-A Construct a scatter plot between the 'ash' and 'malic_acid' columns

Use the plt.scatter to plot these two variables

Use the data['class'] to color the points

In [ ]:#Type your answer

Q2-B (10 points) Construct a scatter matrix between all the attributes, except the last attribute, Class

Use the sns.pairplot function to plot the scatter matrix .. use the 'class' attribute as the hue

In [ ]:#Type your answer

Q- Normalize the data such that each attribute has a minimum of 0 and a maximum of 1

Don't change the content of the original dataframe. The final result will be stored in data_scaled

In [ ]:#Normalizing all the columns .. Accessing the columns with the columns' names

# from sklearn.preprocessing import StandardScaler

# from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

data_scaled = scaler.fit_transform(data)

data_scaled = pd.DataFrame(data_scaled, columns=data.columns)

data_scaled = data.copy()

Q-B Standarize the data such that each attribute has a mean 0 and a standard deviation of 1 (unit variance)

Hint: use preprocessing.StandardScaler

Don't change the content of the original dataframe. The final result will be stored in data_scaled

In [ ]:#Standarizing all the columns .. Accessing the columns with the columns' names

data_scaled = data.copy()

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

data_scaled = scaler.fit_transform(data)

data_scaled = pd.DataFrame(data_scaled, columns=data.columns)

Discretization

Q-C Equal-Width Binning

Convert the values in each attribute to discrete values and use 5 bins.

Use the pandas cut method, pd.cut

In [ ]:data_discrete = data.copy

for column in data_discrete.columns[:-1]:

 data_discrete[column] = pd.cut(data_discrete[column], bins=5, labels=False)

Q-D Equal Frequency Binning

Convert the values in each attribute to discrete values and use 5 bins.

Use the pandas qcut method, pd.qcut

In [ ]:data_freq = data.copy()

for column in data_freq.columns[:-1]:

 data_freq[column] = pd.qcut(data_freq[column], q=5, labels=False)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

USE JUPYTER LAB, below is the provided code and at the end are the questions: import numpy as np import pandas as pd import seaborn as sns import math from sklearn import preprocessing from sklearn...

Data Science, Python, Jupyter Notebook I have a term project for my Capstone class in Data Science. Below is the syllabus, dataset, and the Jupiter Notebook. I am creating a Classification model to...

Hello, I would like some python troubleshooting help. Below I have a python code that I constructed on Jupyter Notebooks. This code will run fine one time, but if I immediately run it a second time I...

Using these packages in python jupyter, import numpy as np import pandas as pd import seaborn as sns import math from sklearn import preprocessing from sklearn import datasets import sklearn from...

Help with Exercise 2 Exercise 1 for Reference: Exercise 1: a) Use the Machine Learning algorithms: k-NN, and Nave Bayes to classify multiphase flow patterns, using the database BDOShohamIML.csv and...

Using Python to do this work: For your solution please include screenshots like i did for better understanding. These are instructions: TWITTER AIRLINE SENTIMENT ANALYSIS In class, we studied the...

Generate a code to remove the 'Adj Close' column import numpy as np import pandas as pd import matplotlib. pyplot as plt import seaborn as sns from sklearn . preprocessing import StandardScaler from...

Need help with a small music recommendation system project in python? Music Recommendation System Milestone 1 Problem Definition The context: Why is this problem important to solve? The objectives:...

Why does the first code not provide a two dimensional array for the Daily returns, while the second one does? What would I need to do to correct it so that it does work? All code is in Python. Code...

Can you also explain how to call P1 from P2 and use the functions created in P1 in P2. P1 Make use of the scikit-learn (sklearn) python package in your function implementations Complete the Following...

1. Let Ax) be a function of period 2n such that f(x) = { 0

Why are sales, sales returns and allowances, bad debts, cash discounts, accounts receivable, and allowance for uncollectible accounts all included in the same cycle?

Elvin Company assembles and installs computers to customer specifications. Elvin has decided to price its jobs at the cost of direct materials and direct labor plus 20%. The job for a local...

The monthly demand for product K has been as follows in the last six months: January 5 0 0 February 4 5 0 March 5 5 0 April 5 0 0 May 6 0 0 June 5 5 0

1. Explain how technology has changed the learning environment.

2. Trainees are geographically dispersed and travel costs related to training are high.

6. Current training methods allow limited time for practice, feedback, and assessment.