Question: Using Python Pandas analyze the HURDAT2 data and answer the following questions. The link to the CSV file can be found here: http://www.cis.umassd.edu/~dkoop/dsc201-2018fa/assignment3.html Each data

Using Python Pandas analyze the HURDAT2 data and answer the following questions.

The link to the CSV file can be found here: http://www.cis.umassd.edu/~dkoop/dsc201-2018fa/assignment3.html

Each data item is a specific point along the hurricanes trajectory, and unlike in the original data, every data item contains the hurricanes identifier and name. The beginning of this file looks like:

identifier,name,num_pts,record_id,status,latitude,longitude,max_wind,min_pressure,datetime AL011851,UNNAMED,14,,HU,28.0,-94.8,80,-999,1851-06-25T00:00:00 AL011851,UNNAMED,14,,HU,28.0,-95.4,80,-999,1851-06-25T06:00:00 AL011851,UNNAMED,14,,HU,28.0,-96.0,80,-999,1851-06-25T12:00:00

and the fields are:

identifier is a unique identifier for each hurricane

name is the hurricanes name or UNNAMED

num_pts is the number of points recorded for the hurricane

record_id is the record identifier as defined by the documentation

status is the status of the system as defined by the documentation

latitude is the latitude of the recorded point

longitude is the longitude of the recorded point

max_wind is the maximum sustained wind (in knots)

min_pressure is the minimum pressure (in millibars), -999 if this was not measured

datetime is the date and time of the recorded point (in Coordinated Universal Time (UTC))

1. Hurricane Names (15 pts)

We again wish to compute the number of unique hurricane names and the most frequently used name. First, we must load the data. Pandas has a read_csv method that will load a dataset into a DataFrame object. Recall that we have the hurricanes name and identifier repeated for each point it was tracked in this dataset. For this analysis, we do not want to have these repeats. Pandas allows us to remove them by (a) projecting the hurricanes to just the identifiers and names and (b) removing duplicates. For (a), you can select a subset of columns using brackets, and multiple columns using a list inside the brackets. For example,

 new_df = df[[col1,col2]]

creates a new dataframe new_df with only columns col1 and col2. For (b), you can use the drop_duplicates method.

a. Number of Unique Hurricane Names

Using your projected, de-duped data frame, compute the number of unique hurricane names. Remember to remove UNNAMED!

b. Most frequently used name

Using the same data frame, compute the most frequently used name.

Hints

You can remove UNNAMED using a boolean index

The value_counts method is useful for counting the occurrences of values.

2. Year with the Most Hurricanes (10 pts)

Now, we need the year with the most hurricanes. Here, we need to have some way to extract the year from the rest of the data. There are (at least) two ways to do this: (a) extract it from the identifier, and (b) extract it from the datetime. For (a), we use pandas string methods on an entire column at once. For example, df.col1.str[:2] extracts the first two characters of col1. For (b), we need to ensure that datetime is understood as a pandas datetime type. This can be accomplished by converting a column using pd.to_datetime method and then using .dt accessors. For example, pd.to_datetime(col1).dt.month converts a column to datetime and then extracts the month.

In both cases, we need to create a new column to store the year. Once you have done this, drop the duplicates to ensure we dont double-count hurricanes and count the number of occurrences per year.

Hints

In addition to value_counts, you can use the max method to take just the highest value.

JUST QUESTIONS #1 and #2

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!