Predicting Delayed Flights. The file FlightDelays.csv contains information on all commercial flights departing the Washington, DC area

Question:

Predicting Delayed Flights. The file FlightDelays.csv contains information on all commercial flights departing the Washington, DC area and arriving at New York during January 2004. For each flight, there is information on the departure and arrival airports, the distance of the route, the scheduled time and date of the flight, and soon. The attribute that we are trying to predict is whether or not a flight is delayed. A delay is defined as an arrival that is at least 15 minutes later than scheduled.

Data Preprocessing. Transform attribute day of week (DAY_WEEK) info a categorical attribute. Bin the scheduled departure time into eight bins. Transform the airline carrier, destination city and origin city to numerical attributes with dummy coding. Use these and all other attributes as predictors (excluding DAY_OF_MONTH, FL_DATE, FL_NUM, and TAIL_NUM). Transform the target attribute to a binominal type and set its appropriate role. By using the Remap Binominals operator for the target attribute, specify the positive value = delayed and negative value = ontime. Partition the data into training (60%) and holdout (40%) sets.

a. Fit a classification tree to the flight delay target attribute using all the relevant predictors. Do not include DEP_TIME (actual departure time) in the model because it is unknown at the time of prediction (unless we are generating our predictions of delays after the plane takes off, which is unlikely). Explain why we excluded DAY_OF_MONTH, FL_DATE, FL_NUM, and TAIL_NUM as predictors in data preprocessing. Use the gain ratio criterion in fitting the tree with maximal depth = 6. Apply pruning and prepruning with default parameters. Express the resulting tree as a set of rules.

b. If you needed to fly between DCA and EWR on a Monday at 7:00 AM, would you be able to use this tree? What other information would you need? Is it available in practice? What information is redundant?

c. Fit the same tree as in (a), this time excluding the Weather predictor. Display this pruned tree. Now, create another tree by increasing the minimal gain required from 0.01 to 0.03. You will find that the tree contains a single leaf node. Display this small tree.

i. How is the small tree used for classification? (What is the rule for classifying?)

ii. To what is this rule equivalent?

iii. Examine the pruned tree. What are the top three predictors according to this tree?

iv. Why, technically, does the small tree result in a single node?

v. What is the disadvantage of using the full-grown tree as opposed to an optimally pruned tree?

vi. Compare this general result from the pruned tree to that from logistic regression in the example in Chapter 10. What are possible reasons for the classification tree’s failure to find a good predictive model?

Fantastic news! We've Found the answer you've been seeking!