Question: In this assignment I will be using the dataset released by The Department of Transportation. This dataset lists flights that occurred in 2015, along with

In this assignment I will be using the dataset released by The Department of Transportation. This dataset lists flights that occurred in 2015, along with other information such as delays, flight time etc.

In this assignment, I will be showing good practices to manipulate data using Python's most popular libraries to accomplish the following:

  • cleaning data with pandas
  • make specific changes with numpy
  • handling date-related values with datetime

Note: please consider the flights departing from BOS, JFK, SFO and LAX.

 

 

Index(['YEAR', 'MONTH', 'DAY', 'ORIGIN_AIRPORT', 'DESTINATION_AIRPORT', 'AIRLINE', 'FLIGHT_NUMBER', 'SCHEDULED_DEPARTURE', 'DEPARTURE_DELAY'], dtype='object')

Index(['IATA_CODE', 'AIRLINE'], dtype='object')


Question 3

For this question, find the top three airline names which have high number of flights and the least percentage of delay compared to other airlines. The result should be a dataframe which has three columns AIRLINE_NAME, NUM_FLIGHTS and PERC_DELAY.

Hint:

  • percentage of delay for each airline is obtained using groupby and apply methods
  • merge flights_df with airlines_df to get the names of top three airlines

TEST ON:

top_three_airlines_df = top_three_airlines(flights_df_raw.copy(), airlines_df.copy())

assert sorted(list(top_three_airlines_df.columns)) == sorted(['NUM_FLIGHTS', 'PERC_DELAY', 'AIRLINE_NAME']), "Dataframe doesn't have required columns"
assert top_three_airlines_df.loc[0, 'AIRLINE_NAME'] == 'United Air Lines Inc.', "Top airline name doesn't match"
 

Answer from your tutor:

Report this answer

 

MateIron10283Active 5 hours ago

 

check explanation.


Explanation:

Based on the provided information and requirements, here's an example implementation to find the top three airlines with the highest number of flights and the least percentage of delays using Python's popular libraries such as pandas, numpy, and datetime.

```python
import pandas as pd

def top_three_airlines(flights_df, airlines_df):
   # Filter flights departing from BOS, JFK, SFO, and LAX
   airports = ['BOS', 'JFK', 'SFO', 'LAX']
   flights_df = flights_df[flights_df['ORIGIN_AIRPORT'].isin(airports)]
   
   # Clean and prepare data
   flights_df['SCHEDULED_DEPARTURE'] = pd.to_datetime(flights_df['SCHEDULED_DEPARTURE'], format='%Y-%m-%d %H:%M:%S')
   flights_df['YEAR'] = flights_df['SCHEDULED_DEPARTURE'].dt.year
   flights_df['MONTH'] = flights_df['SCHEDULED_DEPARTURE'].dt.month
   flights_df['DAY'] = flights_df['SCHEDULED_DEPARTURE'].dt.day
   
   # Calculate the number of flights per airline
   num_flights = flights_df.groupby('AIRLINE').size().reset_index(name='NUM_FLIGHTS')
   
   # Calculate the percentage of delay for each airline
   flights_df['DELAYED'] = flights_df['DEPARTURE_DELAY'] > 0
   perc_delay = flights_df.groupby('AIRLINE')['DELAYED'].mean().reset_index(name='PERC_DELAY')
   
   # Merge with airlines_df to get airline names
   top_airlines = pd.merge(num_flights, airlines_df, left_on='AIRLINE', right_on='IATA_CODE')
   
   # Sort by the number of flights and percentage of delay
   top_airlines.sort_values(by=['NUM_FLIGHTS', 'PERC_DELAY'], ascending=[False, True], inplace=True)
   
   # Select the top three airlines
   top_three = top_airlines.head(3)
   
   # Select only the required columns
   top_three = top_three[['AIRLINE', 'NUM_FLIGHTS', 'PERC_DELAY']].rename(columns={'AIRLINE': 'AIRLINE_NAME'})
   
   return top_three

# Example usage
top_three_airlines_df = top_three_airlines(flights_df_raw.copy(), airlines_df.copy())

print(top_three_airlines_df)
```

Make sure to replace `flights_df_raw` with your raw flights dataset and `airlines_df` with your airlines dataset. The function `top_three_airlines` will return a dataframe containing the top three airline names, the number of flights, and the percentage of delays for each airline.

Please note that the above code assumes you have already loaded and prepared the flights and airlines datasets as pandas dataframes.

Step by Step Solution

3.45 Rating (164 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

Question number 3 def topthreeairlinesflightsdf airlinesdf Calculate percentage of delay for each ai... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!