Question: SIT 2 2 0 / 7 3 1 2 0 2 3 . T 3 : Task 4 P Working with pandas Data Frames (

SIT220/7312023.T3: Task 4P Working with pandas Data Frames (Heterogeneous Data)
1 Introduction
This task is related to Module 4(see the Learning Resources on the unit site; see also Chapters 10,11,12,16 of Minimalist Data Wrangling with Python).
This task is due on Week 11(Friday). However, ideally, you should complete this task by the end of Week 8. Hence, start tackling it as early as possible. If we find your first solution incomplete or otherwise incorrect, you will still be able to amend it based on the generous feedback we will give you (allow 35 working days). In case of any problems/questions, do hot hesitate to attend our on-campus/online classes or use the Discussion Board on the unit site.
Submitting after the aforementioned due date might incur a late penalty. The cut-off date is Week 12(Friday). There will be no extensions (this is a Week 8 task, after all) and no solutions will be accepted thereafter. At that time, if your submission is not 100% complete, it will be marked as FAIL, without the possibility of correcting and resubmitting. This task is part of the hurdle requirements in this unit. Not submitting the correct version on time results in failing the unit.
A good data engineer must have fine time management skills. To ensure a fair environment for all, we are always very strict about deadlines. Moreover, all submissions will be checked for plagiarism (my PhD student developed a quite advanced code similarity detection tool, which I will be using from time to time: beware). You are expected to work independently on your task solutions. Never share/show parts of solutions with/to anyone. Luckily, 95% of the students know how to do the right thing. If you are one of them, you are the best and do not worry; thank you.
1
2 Task
Download the nycflights13_weather.csv.gz data file from our unit site (Learning Resources -> Data). It gives the hourly meteorological data for three airports in New York: LGA, JFK, and EWR for the whole year of 2013. The columns are:
origin weather station: LGA, JFK, or EWR,
year, month, day, hour time of recording,
temp, dewp temperature and dew point in degrees Fahrenheit,
humid relative humidity,
wind_dir, wind_speed, wind_gust wind direction (in degrees), speed and gust speed (in mph),
precip precipitation, in inches,
pressure sea level pressure in millibars,
visib visibility in miles,
time_hour date and hour (based on the year, month, day, hour fields) formatted as YYYY-mm-
dd HH:MM:SS (actually, YYYY-mm-dd HH:00:00). However, due to a bug in the dataset, the data in this column are (incorrectly!) shifted by 1 hour. Do not rely on it unless you manually correct it.
Then, create a single Jupyter/IPython notebook (see the Artefacts section below for all the requirements), where you perform what follows.
1. Convert all columns so that they use metric (International System of Units, SI) or derived units: temp and dewp to Celsius, precip to millimetres, visib to metres, as well as wind_speed and wind_gust to metres per second. Replace the data in-place (overwrite existing columns with new ones).
2. ComputedailymeanwindspeedsfortheLGAairport(~365totalspeedvalues,foreachdaysepa- rately; you can, for example, group the data by year, month, and day at the same time).
3. PresentthedailymeanwindspeedsatLGA(~365aforementioneddatapoints)inasingleplot,e.g., using the matplotlib.pyplot.plot function. The x-axis labels should be human-readable and intuitive (e.g., month names or dates). Reference result:
10
8
6
4
2
2013-012013-03
2013-052013-07 day
2013-092013-11
2014-01
daily average wind speed [m/s] at LGA
4. IdentifythetenwindiestdaysatLGA(datesandthecorrespondingmeandailywindspeeds).2
Reference result:
## wind_speed
## date
## 2013-11-2411.32
## 2013-01-3110.72
## 2013-02-1710.01
## 2013-02-219.19
## 2013-02-189.17
## 2013-03-149.11
## 2013-11-288.94
## 2013-05-268.85
## 2013-05-258.77
## 2013-02-208.66
Important. All packages must be imported and data must be loaded at the beginning of the file (only once!).3 Additional Tasks for Postgraduate (SIT731) Students (*)
Postgraduate students, apart from the above tasks, are additionally required to solve/address/discuss what follows. Integrate these new requirements into the above subtasks (do not create a separate section of the report).
1. Computethemonthlymeanwindspeedsforallthethreeairports.
There is one obvious outlier amongst the observed wind speeds. Locate it (programmatically, do not hardcode the date/day/row number) and replace it with np.nan (NaN) before computing the means.
2. Drawthemonthlymeanwindspeedsforthethreeairportsonthesameplot(threecurvesofdif- ferent colours). Add a readable legend. Reference result:
LGA EWR JFK
6.05.55.04.54.03.5
2013-012013-03
2013-052013-07 month
2013-092013-11
monthl

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!