Question: For this problem, you will need to load the ggplot 2 movies library. This library contains a data frame called movies that contains information about

For this problem, you will need to load the ggplot 2 movies library. This library contains a data
frame called movies that contains information about 58,788 movies. One of the variables in this data
frame is the length (in minutes) and it is interesting to look at this variable in some detail, starting
with the default histogram and boxplot using ggplot2.
a) The dataset appears to have some unusually large outliers. Are these outliers real, or bad data
values? Justify your answer. (Hint: filter the data for movies over 2000 minutes in length,
and then check on the internet to see if the movie lengths are correct.)
b) Using filter, create a subset of the movies dataframe containing only films 3 hours or less
long. Plot a histogram with a bin width of one minute. Describe what features stand out to you.
c) Amongst other features you should have mentioned in part (b) are peaks at 7 minutes and 90
minutes. Draw histograms to show whether these peaks existed both before and after 1980.
What conclusions do you draw?
d) One variable, Short, indicates whether a film was classified as a short film or not. Create plots
to deduce the rule used to define what constitutes a short film. Determine whether the films
have been consistently classified; justify your answer.
Using R programming
For this problem, you will need to load the

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!