Question: Please, solve the question with an RStudio script. The flights data set seen in class is available in the package nycflights 1 3 as a

Please, solve the question with an RStudio script. The flights data set seen in class is available in the package nycflights13 as a tibble. It contains on-time data for all flights that departed New York City via its three main airports throughout the whole year of 2013. The airport from which each flight departed is recorded in the origin column. Load the tidyverse suite of packages, as well as the nycflights13 package and the flights data therein. a) Cancelled flights are defined as those for which no departure or arrival took place. Write code using pipe operators and filter() to create a new data set flights2 which removes any rows with missing values (i.e., NA entries) for the dep_delay or arr_delay variables (Hint: see ?is.na and use the "and" operator & and the negation operator !). b) Define the new variable gain as the difference between the departure delay and the arrival delay, the new variable air_hour as the air_time variable re-expressed in hours, and the variable gain_per_hour as the gain per air_hour. Write code using pipe operators and the functions mutate() and select() which creates a new tibble called flights3 containing only the gain_per_hour and origin variables. (Hint: you may exploit the fact that mutate() allows you to refer to variables you have just created). c) The ggplot2 code below plots a kernel density estimate of the gain_per_hour for each origin airport. However, it overplots all three estimated density curves on the one graph. Modify this code so that it (i) partitions/facets the graph into a separate panel for each origin and (ii) produces histograms with 50 bins rather than density plots. ggplot(flights3, aes(x = gain_per_hour, fill = origin))+ geom_density) Your final plot should look like this:EWR JFK LGA 30000-20000- count origin EWR JFK LGA 10000-0--200-100 o 100-200-1006100-200-100 o 100 gain_per_hour NOTE: for parts (b) and (c) you may refer to the base R code below, which ultimately produces the same output flights. This code is provided for you to check your results only, do not use this code as a solution to parts (b) and (c). flights2<- flights[!is.na(flights$dep_delay) & lis.na(flights arr_delay),] flights2$gain <- flights2$dep_delay - flights2$arr_delay flights2$hours <- flights2$air_time /60 flights2$gain_per_hour <- flights2$gain / flights23hours flights3<- flights2[,c("gain_per_hour", "origin")]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!