Question: STAT 3 2 8 0 - Homework 1 1 starwars The dplyr package includes a dataset called starwars . Using this dataset, answer the following
STAT Homework
starwars
The dplyr package includes a dataset called starwars Using this dataset,
answer the following questions using piped dplyr code:
Which column has the most missing values?
How many individuals in the dataset are neither male nor female?
with a single line of code, find out how many individuals come from the
most frequent species? This one is more difficult
filter the dataset to only include humans, then sort by homeworld de
scending and then by height. What are the last four rows of the resulting
dataset?
Mammal Sleep
the ggplot package includes the msleep dataset. Answer the following using
dplyr code, preferably with a single piped command.
Look at the column names, then modify two of these names to something
that you find is more informative. Print out the first row of the dataset.
How many rows have at least one missing value? First, get rid of the
last two columns, then remove all rows that still have at least one missing
value.
Orders
For this exercise, use the orders and clients datasets found on Canvas
perform a left join of clients with orders based on the num client variable.
Look at the resulting dataset and explain what the join did. Report the
size of the joined dataset in rows and columns.
now, perform an inner join instead. What is the size of this joined dataset?
If the size is different, why are these sizes different?
Now, perform a semi join on these two datasets. What is the result and
why is it different?STAT Homework
starwars
The dplyr package includes a dataset called starwars Using this dataset,
answer the following questions using piped dplyr code:
Which column has the most missing values?
How many individuals in the dataset are neither male nor female?
with a single line of code, find out how many individuals come from the
most frequent species? This one is more difficult
filter the dataset to only include humans, then sort by homeworld de
scending and then by height. What are the last four rows of the resulting
dataset?
Mammal Sleep
the ggplot package includes the msleep dataset. Answer the following using
dplyr code, preferably with a single piped command.
Look at the column names, then modify two of these names to something
that you find is more informative. Print out the first row of the dataset.
How many rows have at least one missing value? First, get rid of the
last two columns, then remove all rows that still have at least one missing
value.
Orders
For this exercise, use the orders and clients datasets found on Canvas
perform a left join of clients with orders based on the num client variable.
Look at the resulting dataset and explain what the join did. Report the
size of the joined dataset in rows and columns.
now, perform an inner join instead. What is the size of this joined dataset?
If the size is different, why are these sizes different?
Now, perform a semi join on these two datasets. What is the result and
why is it different?
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
