Question: Q2 Based on your graphical and numerical analysis in Q1, which method -- 1.5(IQR) or 3(SD) -- is more appropriate to remove outliers from the

Q2 Based on your graphical and numerical analysis in Q1, which method -- 1.5(IQR) or 3(SD) -- is more appropriate to remove outliers from the `departure delay` variable? Remove outliers for `departure delay` with the appropriate method. Store this new dataset as `no_out_dd`. You'll want to use this new dataset without outliers for use in Q3. What proportion of rows remains following the removal of these outliers? Store this number as Q2. Do not hardcode the answer. * Note: A boxplot of departure delays in new dataset will still reveal outliers, based on the new five-number summary. For the purpose of this assignment, we will retain these "new" outliers in our dataframe. To completely remove all outliers, we would need to repeat the outlier removal process multiple times. - Your answer should be a number assigned to Q2. Do not round.

IQR_delay <- IQR(flights$dep_delay, na.rm = TRUE) lower_bound <- quantile(flights$dep_delay, 0.25, na.rm = TRUE) - 1.5 * IQR_delay upper_bound <- quantile(flights$dep_delay, 0.75, na.rm = TRUE) + 1.5 * IQR_delay sd(flights$dep_delay) no_out_dd <- flights |> filter(dep_delay >= lower_bound & dep_delay <= upper_bound) Q2 <- nrow(no_out_dd) %/% nrow(flights)

This was the code I came up with but it's not producing the correct answer. Can you help me figure out what the issue is?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

Lets break down your problem and identify where the error might be in your procedure It ... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!