Question: For this problem, you will need to load the ggplot 2 movies library. This library contains a data frame called movies that contains information about
For this problem, you will need to load the ggplot movies library. This library contains a data
frame called movies that contains information about movies. One of the variables in this data
frame is the length in minutes and it is interesting to look at this variable in some detail, starting
with the default histogram and boxplot using ggplot
a The dataset appears to have some unusually large outliers. Are these outliers real, or bad data
values? Justify your answer. Hint: filter the data for movies over minutes in length,
and then check on the internet to see if the movie lengths are correct.
b Using filter, create a subset of the movies dataframe containing only films hours or less
long. Plot a histogram with a bin width of one minute. Describe what features stand out to you.
c Amongst other features you should have mentioned in part b are peaks at minutes and
minutes. Draw histograms to show whether these peaks existed both before and after
What conclusions do you draw?
d One variable, Short, indicates whether a film was classified as a short film or not. Create plots
to deduce the rule used to define what constitutes a short film. Determine whether the films
have been consistently classified; justify your answer.
Using R programming
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
