Question: I need help specifically with ( 1 H , Part 3 Important Reminders This assignment has hidden tests: tests that are not visible here, but

I need help specifically with

(1

,

Part

3

Important Reminders

This assignment has hidden tests: tests that are not visible here, but that will be run on your submitted assignment for grading.

This means passing all the tests you can see in the notebook here does not guarantee you have the right answer!

In particular many of the tests you can see simply check that the right variable names exist. Hidden tests check the actual values.

It is up to you to check the values, and make sure they seem reasonable.

A reminder to restart the kernel and re

-

run the code as a first line check if things seem to go weird.

For example, note that some cells can only be run once, because they re

-

write a variable

(

for example, your dataframe

),

and change it in a way that means a second execution will fa

Also, running some cells out of order might change the dataframe in ways that may cause an error, which can be fixed by re

-

running.

Run the following cell. These are all you need for the assignment. Do not import additional packages.

[75]

: # Imports

%

matplotlib inline

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

sns

.

set

()

sns

.

set

_

context

('

talk

')

import warnings

warnings.filterwarnings

('

ignore

')

import patsy

import statsmodels.api as sm

import scipy.stats as stats

from scipy.stats import ttest

_

ind, chisquare, normaltest

Note: the statsmodels import may print out a 'FutureWarning'. Thats fine.

Part

1

: Load & Clean the Data

(2.5

points

)

Fixing messy data makes up a large amount of the work of being a Data Scientist.

The real world produces messy measurements and it is your job to find ways to standardize your data such that you can mak

In this section, you will leam, and practice, how to successfully deal with unclean data.

1

)

Load the data

Import datafile COGS

108_

IntroQuestionnaireData.csv into a DataFrame called df

.

YOUR COOE HERE

import pandas as pd

=

.

read

_

csv

('

COGS

108_

IntroQuestionnaireData.csv

')

assert isinstance

(

,

.

DataFrame

)

HCheck out the data

.

head

(5)

1

)

Transform 'year' column

Use standardize

_

year to transform the data in column 'What year

(

in school

)

are you?

' .

Hint: use the apply function AND remember to save your output inside the dataframe

sYour Code here

assert len

(

[

'year'

] .

unique

()) = 7

Assuming that all is correct up to this point, the line below should show all values now found in df

[

'year'

] .

It should look a lot better. With this data, we can now make insightful analyses.

You should see an array with elements

1, 2, 3, 4, 5, 6

and nan

(

not necessarily in that order

) .

Note that if you check the data type of this column, you'll see that pandas converts these numbers to float, even though the applied function returns int, because np

.

nan is considered a float. This is fine

['

year

'] -

unique

()

array

([

nan,

3.0, 5.0, 2.0, 4.0, 1.0, 6.0],

dtype

=

object

)

Let's do it again. Let's take a look at the responses in the 'weight' column, and then standardize them.

First, ensure that all types are consistent, use strings

['

weight

'] =

['

weight

'] .

astype

(

str

)

KeyError

Traceback

(

most recent call last

)

File

/

opt

/

conda

/

lib

/

python

3.11 /

site

-

packages

/

pandas

/

core

/

indexes

/

base

.

py:

3895,

in Index.get

_

loc

(

self

,

key

)

3804

try:

- > 3805

return self.

_

engine.get

_

loc

(

casted

_

key

)

3806

except KeyError as err:

File index.pyx:

167,

in pandas.

_11

.

index.IndexEngine.get

_

loc

()

File index.pyx:

196,

in pandas.

_1

ibs.index.IndexEngine.get

_

loc

()

File pandas

/_

libs

/

hashtable

_

class

_

helper.pxi:

7981,

in pandas.

_1

ibs.hashtable.PyobjecthashTable.get

_

iten

()

File pandas

/_

libs

/

hashtable

_

class

_

helper.pxi:

7

89,

in pandas.

_1

ibs.hashtable.PyObjectHashTable.get

_

iten

()

KeyError: 'weight'

The above exceation was the direct cause of the followine excedtion:

Check all the different answers we received

['

weight

'] .

unique

()

Part

1

: Load & Clean the Data

(2.5

points

)

Fixing messy data makes up a large amount of the work of being a Data Scientist.

The real world produces messy measurements and it is your job to find ways to standardize your data such that you can mak

In this section, you will leam, and practice, how to successfully deal with unclean data.

1

)

Load the data

Import datafile COGS

108_

IntroQuestionnaireData.csv into a DataFrame called df

.

YOUR COOE HERE

import pandas as pd

=

.

read

_

csv

('

COGS

108_

IntroQuestionnaireData.csv

')

assert isinstance

(

,

.

DataFrame

)

HCheck out the data

.

head

(5)

I need help specifically with ( 1 H , Part 3

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Summarize the following chapter in your words: (Basic elements of organizing) The Organizing Process organization.2 As you will see in this chapter, managing the basic frameworks that organizations...

Summarize the following chapter: (Basic elements of organizing) The Organizing Process organization.2 As you will see in this chapter, managing the basic frameworks that organizations use to get...

Hi, Can you please help me with assignment, I am failing to create the train_nn function. Please advise how I can get data to you, my previous efforts have failed. Tensorflow_NeuralNetworkspdf May 1,...

2:53 Done Tools Window Help 1 7 88% Wed 12:21 807F20 - T-Test vs Regression Bonus Assignment.pdf (page 1 of 2) OPTIONAL BONUS ASSIGNMENT - T-TESTS VS. REGRESSION (+2% to your final grade) Note: this...

Summarize the attached document of the WDR 2018 OVERVIEW Learning to realize education's promise Learning to realize education's promise Assess learning Act on evidence Align actors to make it a...

the bottom of the main loop (after getting user input), increment the current player. Then, if the number is too high, reset it to 0. Before printing whose turn it is, print the board using one of...

book.cpp file BookList Sequence Containers Homework Last updated: Friday, February 12, 2021 The following class diagrams should help you visualize the BookList interface, and to remind you what the...

TANGLEWOOD CASEBOOK for use with STAFFING ORGANIZATIONS 5th Ed. Kammeyer-Mueller 1 TANGLEWOOD CASEBOOK To accompany Staffing Organizations, fifth edition, 2006. Prepared by John Kammeyer-Mueller...

TANGLEWOOD CASEBOOK for use with STAFFING ORGANIZATIONS 7th Ed. Kammeyer-Mueller 1 TANGLEWOOD CASEBOOK To accompany Staffing Organizations, seventh edition, 2012. Prepared by John Kammeyer-Mueller...

TANGLEWOOD CASEBOOK for use with STAFFING ORGANIZATIONS 5th Ed. Kammeyer-Mueller 1 TANGLEWOOD CASEBOOK To accompany Staffing Organizations, fifth edition, 2006. Prepared by John Kammeyer-Mueller...

Suppose you hold a small ball in contact with, and directly over, the center of a large ball. If you then drop the small ball a short time after dropping the large ball, the small ball rebounds with...

Multi-Diversified (MD) is an established, U.S.-based multinational corporation that is in the business of producing and selling widgets. MD is listed on the New York Stock Exchange (NYSE) and is...

Question 9 of 1 3 . ? 1 View Policies Current Attempt in Progress Suppose Columbia Sportswear Company had accounts recelvable of $ 2 9 9 , 5 8 6 , 0 0 0 at January 1 , 2 0 2 7 , and $ 2 2 6 , 5 6 6 ,...

1, Baltimore Manufacturing Company just completed its year ended December 31, 2019.Depreciation for the year amounted to $270,000: 30% relates to sales, 20% relates to administrative facilities, and...