Question: Question 2 a ) Read the student _ data file provided into Python ( take note of the file extension to use the appropriate pandas

Question 2
a) Read the student_data file provided into Python (take note of the file extension to use
the appropriate pandas reader to read the data). Drop the first empty column in Python
and assign the DataFrame to a variable student_data.
b) The student_data shows the different midterm scores of students in math, reading and
science, and their favorite ice cream flavors. Select the data in the ice_cream_flavor
column and convert the flavors to a numpy array, then assign it to a variable called flavor.
From the student_data, select the math, reading and science scores all at once and
convert the selected data to a numpy array and assign it to a variable called scores. Print
the data in the flavor and scores arrays.
c) Use the scores and flavor arrays to slice out the scores where the flavor is chocolate only.
The same result can be found using Pandas commands exclusively. Using the
student_data data frame, find the scores where the flavor is chocolate only.
d) Use the scores and flavor arrays to slice out the scores where the flavor is chocolate OR
vanilla. The same result can be found using Pandas commands exclusively. Using the
student_data data frame, find the scores where the flavor is chocolate or vanilla.
e) Use the scores and flavor arrays to slice out the scores where the flavor is not chocolate
(you can use the ~ sign). The same result can be found using Pandas commands
exclusively. Using the student_data data frame, find the scores where the flavor is not
chocolate.
f) Using the student_data data frame and Pandas commands, slice out all math and reading
scores where the flavor is chocolate, then compute the mean of math and reading scores
for this subset.
Question 3
Imagine that you wanted to use the student_data in question 2a to make predictions such that
the ice_cream_flavor, math and reading columns are input variables and science column is the
output variable you want to predict.
a) Use the LabelBinarizer() in the sklearn package to transform the ice_cream_flavor
column in the student_data to dummy variables, then join these dummy variables to the
student_data and drop the original ice_cream_flavor column. Reassign the resulting
DataFrame to a variable called student_data_1. Print out the entire student_data_1
DataFrame.
b) The Pandas get_dummies functionality will produce the same resulting data frame from
Part a) much more concisely. Use this functionality to produce the same data frame and
assign it to student_data_2. Print out the entire student_data_2 DataFrame.
c) Extract the math, reading and science scores and use the StandardScaler() class in the
sklearn.preprocessing module to standardize these scores, then merge the standardized
scores to the dummy variables (you need to extract the dummy variables from
student_data_2), and call the resulting DataFrame student_data_std. Print out the entire
student_data_std DataFrame.
d) Using a split ratio of 70:30, spit the student_data_std into training and test set.
Reference the input and output of the training set as X_train and y_train respectively.
Also reference the input and output of the test set as X_test and y_test respectively. Print
X_train, y_train, X_test and y_test.
e) Using the boston_1 DataFrame in question 1a, select the LowTemp, HighTemp,
WarmestMin, ColdestHigh, AveMin, AveMax columns and use the pipeline
functionality in sklearn to transform the selected data using the MinMaxScalar() and
SimpleImputer() classes. With the SimpleImputer() class, the missing data in each
column should be imputed using the mean value for the column. Assign the resulting
DataFrame to a variable called pipeline_data. Print the pipeline_data
f) Output the descriptive statistics of the pipeline_data including mean, median, variance,
minimum value, maximum value, variance, standard deviation and skewness. Your
results should be in a single data frame and you can do this in a single line of code
using .apply() or .agg() functionality of the pandas DataFrame.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!