Question: ##### section 5: NAs ########## # sometimes you will see values in the data that say NA. # this stands for not applicable and represents
##### section 5: NAs ########## # sometimes you will see values in the data that say NA. # this stands for "not applicable" and represents missing values. # you can think of it as an empty cell in the spreadsheet. # notice in the summary how you can see the number of NAs in numeric and factor cols. summary(d) # one column with NAs is sleep_rem. # you can use the tidyverse function pull() to pull just that column into its own variable. rem % pull(sleep_rem) summary(rem) # because there are missing values in the data, you can't calculate things like mean. mean(rem) # many functions have an optional named argument called na.rm which stands for NA remove. # if you pass TRUE or T to this argument, it excludes all NA values from the calculation. mean(rem, na.rm = T) # the function is.na() returns TRUE or FALSE for whether a value is NA. # so we can do things like this: d %>% count(is.na(conservation)) # and also things like this: d %>% mutate(conservation = if_else(is.na(conservation), "unknown", conservation)) # what does that line of code do? # ??? # It checks every row in the 'conservation' column. If a value is NA, it replaces it with the string "unknown". # If the value is not NA, it keeps the original value. # test whether you were right by running count() on conservation with and without that mutate d %>% count(conservation) d %>% mutate(conservation = if_else(is.na(conservation), "unknown", conservation)) %>% count(conservation) # replacing NA values is
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
