Question: Question 1 a ) Read the boston dataset csv files provided for this assignment into Python ( you can use pd . read _ csv

Question

1

)

Read the boston dataset csv files provided for this assignment into Python

(

you can use pd

.

read

_

csv

()) .

The boston datasets are boston

_

weather

_1,

boston

_

weather

_2,

boston

_

weather

_3,

boston

_

weather

_4,

and more

_

weather

_

variables, then assign the datasets to a DataFrame variable called boston

_1,

boston

_2,

boston

_3,

boston

_4,

and more

_

variables, respectively. Combine or concatenate the DataFrames, boston

_1,

boston

_2,

boston

_3,

and boston

_4

and assign the results to a DataFrame called combined

_

boston. These four datasets should be combined vertically since they have the same variable names, such that boston

_1

is stacked on top of boston

_2,

and the result is stacked on top of boston

_3,

and the result is further stacked on top of boston

_4 .

Horizontally merge, join or concatenate the combined

_

boston and more

_

variables DataFrames and assign the results to a DataFrame called boston

_

data. Print the first five rows of the boston

_

data, and the last five rows of the boston

_

data. Also print out the shape of the boston

_

data.

)

Check the combined

_

boston to verify how many missing data points exist under each column.

)

Drop the rows or instances that contain any missing data. Assign the resulting DataFrame to a variable called clean

_

boston

_

data. Note that this is only one way of dealing with missing data and cases with missing data are usually used if you have sufficient sample size. Check for missing data again to ensure there is no missing data in the clean

_

boston

_

data. Print the shape of the clean

_

boston

_

data

)

Format all the column names to lowercase and include underscore between column names that consist of two words. For example, meanTemp should become mean

_

temp, and Max

24

hrPrep can become max

_24

_

prep, and HighTemp becomes high

_

temp, etc. Reassign the DataFrame with the formatted column names to the same variable, clean

_

boston

_

data. Print or output the columns of the clean

_

boston

_

data DataFrame.

)

Select or slice all data from the clean

_

boston

_

data DataFrame, except the data where the Year is

1930 .

You can call this subset data excluding

_1930 .

Using the excluding

_1930

DataFrame, output the first

20

unique values in the Year column.

)

Select the data from the clean

_

boston

_

data where the Year is

1995

AND the high

_

temp is greater than or equal to

90 .

Output or display the whole selected data. Here, you don

t have to assign it to any variable, but you could if you want to

.

)

Select the data from the clean

_

boston

_

data where the Year is

1995

OR the high

_

temp is greater than

89 .

Output or print the first

20

rows of the selected data. Here, you don

t have to assign it to any variable, but you could if you want to

.

Question

2

)

Read the student

_

data file provided into Python

(

take note of the file extension to use the appropriate pandas reader to read the data

) .

Drop the first empty column in Python and assign the DataFrame to a variable student

_

data.

)

The student

_

data shows the different midterm scores of students in math, reading and science, and their favorite ice cream flavors. Select the data in the ice

_

cream

_

flavor column and convert the flavors to a numpy array, then assign it to a variable called flavor. From the student

_

data, select the math, reading and science scores all at once and convert the selected data to a numpy array and assign it to a variable called scores. Print the data in the flavor and scores arrays.

)

Use the scores and flavor arrays to slice out the scores where the flavor is chocolate only. The same result can be found using Pandas commands exclusively. Using the student

_

data data frame, find the scores where the flavor is chocolate only.

)

Use the scores and flavor arrays to slice out the scores where the flavor is chocolate OR vanilla. The same result can be found using Pandas commands exclusively. Using the student

_

data data frame, find the scores where the flavor is chocolate or vanilla.

)

Use the scores and flavor arrays to slice out the scores where the flavor is not chocolate

(

you can use the

sign

) .

The same result can be found using Pandas commands exclusively. Using the student

_

data data frame, find the scores where the flavor is not chocolate.

)

Using the student

_

data data frame and Pandas commands, slice out all math and reading scores where the flavor is chocolate, then compute the mean of math and reading scores for this subset.

Question

3

Imagine that you wanted to use the student

_

data in question

2

a to make predictions such that the ice

_

cream

_

flavor, math and reading columns are input variables and science column is the output variable you want to predict.

)

Use the LabelBinarizer

()

in the sklearn package to transform the ice

_

cream

_

flavor column in the student

_

data to dummy variables, then join these dummy variables to the student

_

data and drop the original ice

_

cream

_

flavor column. Reassign the resulting DataFrame to a variable called student

_

data

_1 .

Print out the entire student

_

data

_1

DataFrame.

)

The Pandas get

_

dum

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!

In this task, you will need to implement linear regression and get to see it work on data. For the starter, you will n eed to download the starter code and unzip its contents to the directory where...

What to do At a high-level, your next steps are to: Open the file bridge_functions.py. Make sure that the file is in the same directory as a2_checker.py, and pyta. Complete the function definitions...

Hello, I make a Simple Linear Regression for my project, but I want to summary of each function from this Pythone code. I wrote some of it but I need your help. Please Help me. This is my Simple...

Using Python to do this assignment Sorry, I couldn't attach excel file, so could you please just use the data in the pics to solve this question Question #1 1. Download the heart.csv dataset from the...

Kmeans python file import csv import numpy as np import matplotlib.pyplot as plt # Computes the distance between two data points def calc_distance(X1, X2): return(np.sum((X1 - X2)**2))**0.5 #...

python code import csv import numpy as np import matplotlib.pyplot as plt # Computes the distance between two data points def calc_distance(X1, X2): return(np.sum((X1 - X2)**2))**0.5 # Function to...

Assignment Overview This lab exercise provides practice with lists, functions and namespaces in Python. ^ Instructions ^ Starter Code ^immigration.csv file Assignment Overview This lab exercise...

Please Do it in Python WTIHOUT USING ChatGPT. Thank you Recursive Backtracking You are to implement a program that can find an assignment of exams to rooms without conflicts using recursive...

Please help me develop this program in python Lab Exercise #7 Assignment Overview This lab exercise provides practice with lists, functions and namespaces in Python. You will work with a partner on...

You purchase a piece of furniture worth $5,000 on credit through a local furniture store. You are told that your monthly payment will be $146.35, including an acquisition fee of $25, at a 10% add-on...

1. Using content from this chapter and the previous chapter (on measurement), indicate whether you would expect the following accounts to be (1) measured at fair value, (2) tested for impairment at...

Three widely used methods of evaluating capital investment proposals include period, return on , and future cash flows.

Can you elucidate the role of molecular chaperones and protein quality control systems in maintaining the integrity and homeostasis of the cytoplasm, particularly under conditions of cellular stress...