Question: Question 2 a ) Read the student _ data file provided into Python ( take note of the file extension to use the appropriate pandas

Question

2

)

Read the student

_

data file provided into Python

(

take note of the file extension to use

the appropriate pandas reader to read the data

) .

Drop the first empty column in Python

and assign the DataFrame to a variable student

_

data.

)

The student

_

data shows the different midterm scores of students in math, reading and

science, and their favorite ice cream flavors. Select the data in the ice

_

cream

_

flavor

column and convert the flavors to a numpy array, then assign it to a variable called flavor.

From the student

_

data, select the math, reading and science scores all at once and

convert the selected data to a numpy array and assign it to a variable called scores. Print

the data in the flavor and scores arrays.

)

Use the scores and flavor arrays to slice out the scores where the flavor is chocolate only.

The same result can be found using Pandas commands exclusively. Using the

student

_

data data frame, find the scores where the flavor is chocolate only.

)

Use the scores and flavor arrays to slice out the scores where the flavor is chocolate OR

vanilla. The same result can be found using Pandas commands exclusively. Using the

student

_

data data frame, find the scores where the flavor is chocolate or vanilla.

)

Use the scores and flavor arrays to slice out the scores where the flavor is not chocolate

(

you can use the

sign

) .

The same result can be found using Pandas commands

exclusively. Using the student

_

data data frame, find the scores where the flavor is not

chocolate.

)

Using the student

_

data data frame and Pandas commands, slice out all math and reading

scores where the flavor is chocolate, then compute the mean of math and reading scores

for this subset.

Question

3

Imagine that you wanted to use the student

_

data in question

2

a to make predictions such that

the ice

_

cream

_

flavor, math and reading columns are input variables and science column is the

output variable you want to predict.

)

Use the LabelBinarizer

()

in the sklearn package to transform the ice

_

cream

_

flavor

column in the student

_

data to dummy variables, then join these dummy variables to the

student

_

data and drop the original ice

_

cream

_

flavor column. Reassign the resulting

DataFrame to a variable called student

_

data

_1 .

Print out the entire student

_

data

_1

DataFrame.

)

The Pandas get

_

dummies functionality will produce the same resulting data frame from

Part a

)

much more concisely. Use this functionality to produce the same data frame and

assign it to student

_

data

_2 .

Print out the entire student

_

data

_2

DataFrame.

)

Extract the math, reading and science scores and use the StandardScaler

()

class in the

sklearn.preprocessing module to standardize these scores, then merge the standardized

scores to the dummy variables

(

you need to extract the dummy variables from

student

_

data

_2),

and call the resulting DataFrame student

_

data

_

std

.

Print out the entire

student

_

data

_

std DataFrame.

)

Using a split ratio of

70

30,

spit the student

_

data

_

std into training and test set.

Reference the input and output of the training set as X

_

train and y

_

train respectively.

Also reference the input and output of the test set as X

_

test and y

_

test respectively. Print

_

train, y

_

train, X

_

test and y

_

test.

)

Using the boston

_1

DataFrame in question

1

,

select the LowTemp, HighTemp,

WarmestMin, ColdestHigh, AveMin, AveMax columns and use the

pipeline

functionality in sklearn to transform the selected data using the MinMaxScalar

()

and

SimpleImputer

()

classes. With the SimpleImputer

()

class, the missing data in each

column should be imputed using the mean value for the column. Assign the resulting

DataFrame to a variable called pipeline

_

data. Print the pipeline

_

data

)

Output the descriptive statistics of the pipeline

_

data including mean, median, variance,

minimum value, maximum value, variance, standard deviation and skewness. Your

results should be in a single data frame and you can do this in a single line of code

using

.

apply

()

.

agg

()

functionality of the pandas DataFrame.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!

Help with a java project: Must have at least 6 classes. Description For this project, you will put together several techniques and concepts you have learned in CS 210 (or from a similar background)...

can anyone build a student data base file using python and the information on the photo below The main objective of this assignment is to design and implement A Student Assessment database software...

Q.2 Even though the revenue recognition steps we covered in class and in the textbook were integrated in U.S. GAAP in 2016, imagine that these rules were in place in 2011. Based solely on the...

Help with a java project please: User Request "Create a simple system to read, search, remove, and write restaurant data." Objectives: 1. 2. 3. 4. 5. 10 points 10 points Use standard Java I/O to...

The main objective of this assignment is to design and implement A Student Assessment database software application, for managing students workload, using the python concepts learnt so far like...

PROJECT 1 (42 MARKS) Advanced Financial Reporting has two projects. Project 1 is due at the end of Week 3 and covers material from Weeks 1 to 3. Project 2 is due at the end of Week 5 and focuses on...

3 4 Objective: This assignment is intended to give you experience in preparing a Master Budget and will help you understand the relationship between the Operation Budget and the Financial Budget. Use...

At the end of Week 3, you will be required to create and submit one Excel file for your answers to Questions 1 to 4. It should contain eight worksheets named as follows: 1. Q1(a) Jes and calcs 2....

1.0 Overview Madame Pince, the head librarian at Hogwarts has expressed an interest in the new "Computational Magic" to Professor Dumbledore and has asked if it would be possible to obtain a program...

PROTECTED VIEW Be carefulfiles from the Internet can contain viruses. Unless you need to edit, it's safer to stay fr B D E F G H Objective: This assignment is intended to give you experience in...

A small economy produces only two goods: bread and butter. The price and quantity of these goods for the years 2018, 2019 and 2020 were as follows: Quantity of Bread (per kg) 2018 $1.55 200 2019...

Replacement problem when old machine has a positive book value. Assume the same facts as in Problem 2.3 except that the new machine will have a salvage value of $12,000. Assume further that the old...

7. What will be the contents of BX after the following instructions execute? mov bx,5 stc mov ax,60h adc bx,ax

Pharoah Company reported the following amounts for 2022: Raw materials purchased $95,200 Beginning raw materials inventory 5,824 Ending raw materials inventory 5,040 Beginning finished goods...