Question: Question 2 a ) Read the student _ data file provided into Python ( take note of the file extension to use the appropriate pandas
Question
a Read the studentdata file provided into Python take note of the file extension to use
the appropriate pandas reader to read the data Drop the first empty column in Python
and assign the DataFrame to a variable studentdata.
b The studentdata shows the different midterm scores of students in math, reading and
science, and their favorite ice cream flavors. Select the data in the icecreamflavor
column and convert the flavors to a numpy array, then assign it to a variable called flavor.
From the studentdata, select the math, reading and science scores all at once and
convert the selected data to a numpy array and assign it to a variable called scores. Print
the data in the flavor and scores arrays.
c Use the scores and flavor arrays to slice out the scores where the flavor is chocolate only.
The same result can be found using Pandas commands exclusively. Using the
studentdata data frame, find the scores where the flavor is chocolate only.
d Use the scores and flavor arrays to slice out the scores where the flavor is chocolate OR
vanilla. The same result can be found using Pandas commands exclusively. Using the
studentdata data frame, find the scores where the flavor is chocolate or vanilla.
e Use the scores and flavor arrays to slice out the scores where the flavor is not chocolate
you can use the ~ sign The same result can be found using Pandas commands
exclusively. Using the studentdata data frame, find the scores where the flavor is not
chocolate.
f Using the studentdata data frame and Pandas commands, slice out all math and reading
scores where the flavor is chocolate, then compute the mean of math and reading scores
for this subset.
Question
Imagine that you wanted to use the studentdata in question a to make predictions such that
the icecreamflavor, math and reading columns are input variables and science column is the
output variable you want to predict.
a Use the LabelBinarizer in the sklearn package to transform the icecreamflavor
column in the studentdata to dummy variables, then join these dummy variables to the
studentdata and drop the original icecreamflavor column. Reassign the resulting
DataFrame to a variable called studentdata Print out the entire studentdata
DataFrame.
b The Pandas getdummies functionality will produce the same resulting data frame from
Part a much more concisely. Use this functionality to produce the same data frame and
assign it to studentdata Print out the entire studentdata DataFrame.
c Extract the math, reading and science scores and use the StandardScaler class in the
sklearn.preprocessing module to standardize these scores, then merge the standardized
scores to the dummy variables you need to extract the dummy variables from
studentdata and call the resulting DataFrame studentdatastd Print out the entire
studentdatastd DataFrame.
d Using a split ratio of : spit the studentdatastd into training and test set.
Reference the input and output of the training set as Xtrain and ytrain respectively.
Also reference the input and output of the test set as Xtest and ytest respectively. Print
Xtrain, ytrain, Xtest and ytest.
e Using the boston DataFrame in question a select the LowTemp, HighTemp,
WarmestMin, ColdestHigh, AveMin, AveMax columns and use the pipeline
functionality in sklearn to transform the selected data using the MinMaxScalar and
SimpleImputer classes. With the SimpleImputer class, the missing data in each
column should be imputed using the mean value for the column. Assign the resulting
DataFrame to a variable called pipelinedata. Print the pipelinedata
f Output the descriptive statistics of the pipelinedata including mean, median, variance,
minimum value, maximum value, variance, standard deviation and skewness. Your
results should be in a single data frame and you can do this in a single line of code
using apply or agg functionality of the pandas DataFrame.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
