Question: I need help specifically with ( 1 H , Part 3 Important Reminders This assignment has hidden tests: tests that are not visible here, but

I need help specifically with (1H, Part 3
Important Reminders
This assignment has hidden tests: tests that are not visible here, but that will be run on your submitted assignment for grading.
This means passing all the tests you can see in the notebook here does not guarantee you have the right answer!
In particular many of the tests you can see simply check that the right variable names exist. Hidden tests check the actual values.
It is up to you to check the values, and make sure they seem reasonable.
A reminder to restart the kernel and re-run the code as a first line check if things seem to go weird.
For example, note that some cells can only be run once, because they re-write a variable (for example, your dataframe), and change it in a way that means a second execution will fa
Also, running some cells out of order might change the dataframe in ways that may cause an error, which can be fixed by re-running.
Run the following cell. These are all you need for the assignment. Do not import additional packages.
[75]: # Imports
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
sns.set_context('talk')
import warnings
warnings.filterwarnings('ignore')
import patsy
import statsmodels.api as sm
import scipy.stats as stats
from scipy.stats import ttest_ind, chisquare, normaltest
Note: the statsmodels import may print out a 'FutureWarning'. Thats fine.
Part 1: Load & Clean the Data (2.5 points)
Fixing messy data makes up a large amount of the work of being a Data Scientist.
The real world produces messy measurements and it is your job to find ways to standardize your data such that you can mak
In this section, you will leam, and practice, how to successfully deal with unclean data.
1a) Load the data
Import datafile COGS108_IntroQuestionnaireData.csv into a DataFrame called df.
YOUR COOE HERE
import pandas as pd
df = pd.read_csv('COGS108_IntroQuestionnaireData.csv')
assert isinstance(df, pd.DataFrame)
HCheck out the data
df. head(5)
1h) Transform 'year' column
Use standardize_year to transform the data in column 'What year (in school) are you?'.
Hint: use the apply function AND remember to save your output inside the dataframe
sYour Code here
assert len(df[ 'year'].unique())=7
Assuming that all is correct up to this point, the line below should show all values now found in df[ 'year'].
It should look a lot better. With this data, we can now make insightful analyses.
You should see an array with elements 1,2,3,4,5,6 and nan (not necessarily in that order).
Note that if you check the data type of this column, you'll see that pandas converts these numbers to float, even though the applied function returns int, because np. nan is considered a float. This is fine
df['year']-unique()
array([ nan, 3.0,5.0,2.0,4.0,1.0,6.0], dtype=object)
Let's do it again. Let's take a look at the responses in the 'weight' column, and then standardize them.
First, ensure that all types are consistent, use strings
df['weight']= df['weight'].astype(str)
KeyError
Traceback (most recent call last)
File/opt/conda/lib/python3.11/site-packages/pandas/core/indexes/base.py:3895, in Index.get_loc(self, key)
3804 try:
->3805 return self._engine.get_loc(casted_key)
3806 except KeyError as err:
File index.pyx:167, in pandas._11bs.index.IndexEngine.get_loc()
File index.pyx:196, in pandas._1ibs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:7981, in pandas._1ibs.hashtable.PyobjecthashTable.get_iten()
File pandas/_libs/hashtable_class_helper.pxi:7e89, in pandas._1ibs.hashtable.PyObjectHashTable.get_iten()
KeyError: 'weight'
The above exceation was the direct cause of the followine excedtion:
Check all the different answers we received
df['weight'].unique()
Part 1: Load & Clean the Data (2.5 points)
Fixing messy data makes up a large amount of the work of being a Data Scientist.
The real world produces messy measurements and it is your job to find ways to standardize your data such that you can mak
In this section, you will leam, and practice, how to successfully deal with unclean data.
1a) Load the data
Import datafile COGS108_IntroQuestionnaireData.csv into a DataFrame called df.
YOUR COOE HERE
import pandas as pd
df = pd.read_csv('COGS108_IntroQuestionnaireData.csv')
assert isinstance(df, pd.DataFrame)
HCheck out the data
df. head(5)
I need help specifically with ( 1 H , Part 3

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!