Question: Question 1 a ) Read the boston dataset csv files provided for this assignment into Python ( you can use pd . read _ csv
Question
a Read the boston dataset csv files provided for this assignment into Python you can use pdreadcsv The boston datasets are bostonweather bostonweather bostonweather bostonweather and moreweathervariables, then assign the datasets to a DataFrame variable called boston boston boston boston and morevariables, respectively. Combine or concatenate the DataFrames, boston boston boston and boston and assign the results to a DataFrame called combinedboston. These four datasets should be combined vertically since they have the same variable names, such that boston is stacked on top of boston and the result is stacked on top of boston and the result is further stacked on top of boston Horizontally merge, join or concatenate the combinedboston and morevariables DataFrames and assign the results to a DataFrame called bostondata. Print the first five rows of the bostondata, and the last five rows of the bostondata. Also print out the shape of the bostondata.
b Check the combinedboston to verify how many missing data points exist under each column.
c Drop the rows or instances that contain any missing data. Assign the resulting DataFrame to a variable called cleanbostondata. Note that this is only one way of dealing with missing data and cases with missing data are usually used if you have sufficient sample size. Check for missing data again to ensure there is no missing data in the cleanbostondata. Print the shape of the cleanbostondata
d Format all the column names to lowercase and include underscore between column names that consist of two words. For example, meanTemp should become meantemp, and MaxhrPrep can become maxhrprep, and HighTemp becomes hightemp, etc. Reassign the DataFrame with the formatted column names to the same variable, cleanbostondata. Print or output the columns of the cleanbostondata DataFrame.
e Select or slice all data from the cleanbostondata DataFrame, except the data where the Year is You can call this subset data excluding Using the excluding DataFrame, output the first unique values in the Year column.
f Select the data from the cleanbostondata where the Year is AND the hightemp is greater than or equal to Output or display the whole selected data. Here, you dont have to assign it to any variable, but you could if you want to g Select the data from the cleanbostondata where the Year is OR the hightemp is greater than Output or print the first rows of the selected data. Here, you dont have to assign it to any variable, but you could if you want to
Question
a Read the studentdata file provided into Python take note of the file extension to use the appropriate pandas reader to read the data Drop the first empty column in Python and assign the DataFrame to a variable studentdata.
b The studentdata shows the different midterm scores of students in math, reading and science, and their favorite ice cream flavors. Select the data in the icecreamflavor column and convert the flavors to a numpy array, then assign it to a variable called flavor. From the studentdata, select the math, reading and science scores all at once and convert the selected data to a numpy array and assign it to a variable called scores. Print the data in the flavor and scores arrays.
c Use the scores and flavor arrays to slice out the scores where the flavor is chocolate only. The same result can be found using Pandas commands exclusively. Using the studentdata data frame, find the scores where the flavor is chocolate only.
d Use the scores and flavor arrays to slice out the scores where the flavor is chocolate OR vanilla. The same result can be found using Pandas commands exclusively. Using the studentdata data frame, find the scores where the flavor is chocolate or vanilla.
e Use the scores and flavor arrays to slice out the scores where the flavor is not chocolate you can use the ~ sign The same result can be found using Pandas commands exclusively. Using the studentdata data frame, find the scores where the flavor is not chocolate.
f Using the studentdata data frame and Pandas commands, slice out all math and reading scores where the flavor is chocolate, then compute the mean of math and reading scores for this subset.
Question
Imagine that you wanted to use the studentdata in question a to make predictions such that the icecreamflavor, math and reading columns are input variables and science column is the output variable you want to predict.
a Use the LabelBinarizer in the sklearn package to transform the icecreamflavor column in the studentdata to dummy variables, then join these dummy variables to the studentdata and drop the original icecreamflavor column. Reassign the resulting DataFrame to a variable called studentdata Print out the entire studentdata DataFrame.
b The Pandas getdum
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
