Predicting Delayed Flights (Boosting). The file FlightDelays.csv contains information on all commercial flights departing the Washington, DC

Question:

Predicting Delayed Flights (Boosting). The file FlightDelays.csv contains information on all commercial flights departing the Washington, DC area and arriving at New York during January 2004. For each flight, there is information on the departure and arrival airports, the distance of the route, the scheduled time and date of the flight, and so on. The attribute that we are trying to predict is whether or not a flight is delayed. A delay is defined as an arrival that is at least 15 minutes later than scheduled. Data Preprocessing. Transform attribute day of week (DAY_WEEK) info a categorical attribute. Bin the scheduled departure time into eight bins. Use these and all other attributes as predictors (excluding DAY_OF_MONTH, FL_DATE, FL_NUM, and TAIL_NUM). Transform the target attribute to a binominal type, and set its appropriaterole. By using the Remap Binominals operator for the target attribute, specify the positive value = delayed and negative value = ontime. Partition the data into training (60%) and validation (40%) sets.
a. Fit a classification tree for flight delay using all the relevant predictors—use the Decision Tree operator. Do not include DEP_TIME (actual departure time) in the model because it is unknown at the time of prediction (unless we are generating our predictions of delays after the plane takes off, which is unlikely). Explain why we excluded DAY_OF_MONTH, FL_DATE, FL_NUM, and TAIL_NUM as predictors in data preprocessing. Use the gain ratio criterion in fitting the tree with maximal depth = 6. Apply pruning and prepruning with default parameters. Report the confusion matrix, overall accuracy, precision, and recall.
b. Run a boosted classification tree for flight delay using the same predictors as above use the Bayesian Boosting operator with Decision Tree operator (same parameters mentioned above) as the base estimator. Report the confusion matrix, overall accuracy, precision, and recall. Compared with the single tree, how does the boosted tree behave in terms of overall accuracy?
c. Compared with the single tree, how does the boosted tree behave in terms of accuracy in identifying delayed flights?
d. Explain why this model might have the best performance over the other models you fit.

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question

Machine Learning For Business Analytics

ISBN: 9781119828792

1st Edition

Authors: Galit Shmueli, Peter C. Bruce, Amit V. Deokar, Nitin R. Patel

Question Posted: