Question: Using Python Pandas analyze the HURDAT2 data and answer the following questions. The link to the CSV file can be found here: http://www.cis.umassd.edu/~dkoop/dsc201-2018fa/assignment3.html Each data

Using Python Pandas analyze the HURDAT2 data and answer the following questions.

The link to the CSV file can be found here: http://www.cis.umassd.edu/~dkoop/dsc201-2018fa/assignment3.html

Each data item is a specific point along the hurricanes trajectory, and unlike in the original data, every data item contains the hurricanes identifier and name. The beginning of this file looks like:

identifier,name,num_pts,record_id,status,latitude,longitude,max_wind,min_pressure,datetime AL011851,UNNAMED,14,,HU,28.0,-94.8,80,-999,1851-06-25T00:00:00 AL011851,UNNAMED,14,,HU,28.0,-95.4,80,-999,1851-06-25T06:00:00 AL011851,UNNAMED,14,,HU,28.0,-96.0,80,-999,1851-06-25T12:00:00

and the fields are:

identifier is a unique identifier for each hurricane

name is the hurricanes name or UNNAMED

num_pts is the number of points recorded for the hurricane

record_id is the record identifier as defined by the documentation

status is the status of the system as defined by the documentation

latitude is the latitude of the recorded point

longitude is the longitude of the recorded point

max_wind is the maximum sustained wind (in knots)

min_pressure is the minimum pressure (in millibars), -999 if this was not measured

datetime is the date and time of the recorded point (in Coordinated Universal Time (UTC))

1. Hurricane Names (15 pts)

We again wish to compute the number of unique hurricane names and the most frequently used name. First, we must load the data. Pandas has a read_csv method that will load a dataset into a DataFrame object. Recall that we have the hurricanes name and identifier repeated for each point it was tracked in this dataset. For this analysis, we do not want to have these repeats. Pandas allows us to remove them by (a) projecting the hurricanes to just the identifiers and names and (b) removing duplicates. For (a), you can select a subset of columns using brackets, and multiple columns using a list inside the brackets. For example,

 new_df = df[[col1,col2]]

creates a new dataframe new_df with only columns col1 and col2. For (b), you can use the drop_duplicates method.

a. Number of Unique Hurricane Names

Using your projected, de-duped data frame, compute the number of unique hurricane names. Remember to remove UNNAMED!

b. Most frequently used name

Using the same data frame, compute the most frequently used name.

Hints

You can remove UNNAMED using a boolean index

The value_counts method is useful for counting the occurrences of values.

2. Year with the Most Hurricanes (10 pts)

Now, we need the year with the most hurricanes. Here, we need to have some way to extract the year from the rest of the data. There are (at least) two ways to do this: (a) extract it from the identifier, and (b) extract it from the datetime. For (a), we use pandas string methods on an entire column at once. For example, df.col1.str[:2] extracts the first two characters of col1. For (b), we need to ensure that datetime is understood as a pandas datetime type. This can be accomplished by converting a column using pd.to_datetime method and then using .dt accessors. For example, pd.to_datetime(col1).dt.month converts a column to datetime and then extracts the month.

In both cases, we need to create a new column to store the year. Once you have done this, drop the duplicates to ensure we dont double-count hurricanes and count the number of occurrences per year.

Hints

In addition to value_counts, you can use the max method to take just the highest value.

JUST QUESTIONS #1 and #2

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Using the Annual Report of your selected company answer the following questions in the Discussion: What are adjusting entries and why are they necessary? In your chosen company, which accounts might...

Using the Annual Report of your selected company answer the following questions in the Discussion: What is the value of the company's inventory at year end? What was the amount of cost of goods sold...

Project 0 6 : Data Analysis Regarding Milestones: A milestone or milepost is a marker placed along a highway to tell you how far you have come, or to indicate your progress toward your destination....

Hudson: Lisa: Fred: Jake: We've made a lot of progress researching various ERP vendors. However, ERP software for car dealerships is still fairly expensive for a small dealership like ours. Combine...

1 Exercise 3: Lift and Airfoils The first part of this week's assignment is to choose and research a reciprocating engine powered (i.e. propeller type) aircraft. You will further use your selected...

Read all the materials below and answer the questions: (Please do not attempt to solve if you can not answer all!!!!) 3-1: Chapter 8: Language Barriers and Translation Precautions This chapter starts...

As part of theSchool-Age Planning Assignment, u are creating one learning experience based on a Kindergarten math standard. The planning format you are utilizing to create the learning experience is...

Chapter 9 Compensation and Incentives Diane Bigda/Photodisc/Getty Images Learning Objectives After reading this chapter, you should be able to do the following: Discuss various psychological...

Segment Performance Evaluation using a TUR In this assignment you will analyze the performance of actual company divisions. SFAS 131 requires publicly traded companies to disclose segment information...

Consider the baseball coefficient of restitution data first presented in Exercise 8-79. (a) Does the data support the claim that the mean coefficient of restitution of baseballs exceeds 0.635? Use a...

The frequency distributions shown indicate the percentages of public school students in fourth-grade reading and mathematics who performed at or above the required proficiency levels for the 50...

Sofa recently started a high - sugae, low - fiber diet and began experiencing bloating invegular bowel movements, and fatigue. Her doctor suggested her symptoms might be linked to an imbalance in her...

Questions Q1. Write a Python program to retrieve the first and last colors from the following list: color_list = ["red", "green", "white", "blue", "black") Q2. Given the following dictionary,...