Data Set This is a public data set from the US Department of Transportation that describes the
Question:
Data Set
This is a public data set from the US Department of Transportation that describes the on-time performance of US domestic air travel from 1987 to 2008. The data set contains over 120M rows and will take up about 12GB when uncompressed. The data is available to you here:
Flight Data CSV Files (~2.5GB):https://storage.googleapis.com/onx-bi-take-home/FlightDataCSV.zip
Additional Data
Additionally there are two files containing metadata for carrier companies and airports. It will be necessary to join this data to the main data set in order to answer some of the questions.
Airport Data CSV (<1MB):https://storage.googleapis.com/onx-bi-take-home/airports.csvCarrier Data CSV (<1MB):https://storage.googleapis.com/onx-bi-take-home/carriers.csv
SQL Client (Mac/Windows)
There are many different clients for SQL on both Mac and Windows, and all are equally capable. Feel free to use whatever is comfortable for you. DBeaver is a fully functional SQL client and the Community version is available on both Mac and Windows for free:
https://dbeaver.io/download
Analysis Questions
Please answer all of the following questions. Include an explanation of your process or any SQL code used to query the data. Answers will be judged on being correct and concise, as well as on using understandable and maintainable code.
1. What percentage of flights were canceled each year from 1999 to 2003?
2. On which day of the week in 2007 were you most likely to arrive on time flying from MCO to IAH 3. Which 10 flights (airline, flight number, origin city, destination city, and date) had the latest actual vs. scheduled arrival in 2004?
4. For each year from 1987 to 2008, which airline made the trip between ORD and LAX the fastest (on average)?
5. For the years 2002 to 2005, what is the ratio of carrier delay to elapsed travel time for each airline? 6. What airline spent the most and least average time taxiing (in and out) at JFK in 2006? 7. What were the top 10 routes (origin and destination city names and airport codes) most likely to have a weather delay of over 10 minutes in December 2005?
a. Only consider routes with at least 20 flights that month.
8. Flying Southwest, what is the year-over-year change in on-time travel rate from 2000 to 2007? 9. What was the month-to-date on-time arrival rate for United for each date in September 2005?
Understanding Business Statistics
ISBN: 978-1118145258
1st edition
Authors: Stacey Jones, Tim Bergquist, Ned Freed