Complete the following R codes by using R studio 1pt Comments entered correctly above (you removed the ' ' text) 2pts Install and Load the tidyverse and fivethirtyeight packages install packages( ) install packages( ) library( ) library( ) 1pt Create object called flying df using the built in flying dataset in the fivethirtyeight package select(names( sapply( , anyNA) )) this line is selecting any column that has an NA mutate all(as character) in order to add our new level we need to first convert these cols to text replace( , is na( ), ) replace any NAs with No Response mutate all(as ) convert the columns back to factors 2pts overwrite the matching columns in flying clean, with our clean q columns ,colnames(clean qs)

The Answer is in the image, click to view ...

Question: Complete the following R codes by using R.studio ## 1pt: Comments entered correctly above (you removed the ' ' text) ---- ## 2pts: Install and

Complete the following R codes by using R.studio

## 1pt: Comments entered correctly above (you removed the '' text) ----

## 2pts: Install and Load the tidyverse and fivethirtyeight packages ----

install.packages("")

library()

## 1pt: Create object called flying_df using the built in flying dataset in the fivethirtyeight package ----

<- flying

## 1pt: view the R Documentation (help page) for flying ----

## 1pt: preview flying_df in the spreadsheet view (invoke a data viewer) ----

(flying_df)

## 2pts: How was this data collected and what does this data represent (what do the rows represent)? ----

## We are interested in exploring this data, and could predict many different outputs based on the...

##...columns we are given here.

## 2pts: Look at the given columns and discuss the top two/three things you think would be... ----

##...most interesting as the target of a predictive model. Give reasons why you chose those fields.

## 1pt: view a summary of the flying_df ----

(flying_df)

## The output of the above line shows us the statistics and counts per column (depending on the data type).

## Notice the NA counts in certain fields. These are missing values in our data.

## Also, some columns only return the length, class, and mode (for Character fields).

## Looking at the spreadsheet view of the data from before, it seems that all the values in...

##...each column were an option (not open-ended responses), so it would make more sense if the...

##...columns that are stored as Char are converted over to Factors instead.

## 3pts: convert all character fields into factors ----

## Note: this code is a copy/paste/tweak from: https://gist.github.com/ramhiser/93fe37be439c480dc26c4bed8aab03dd

flying_df <- %>%

mutate_if(sapply(flying_df, is.), as.)

## 1pt: Check that the data was successfully converted ----

summary(flying_df)

## Now let's count the number of missing values in each column

## 3pts: complete the sapply() to find the missing counts ----

sapply(, function(x) (is.(x)))

#for other ways of doing this visit: https://sebastiansauer.github.io/sum-isna/

## 1pt: what is the percent missing per column? ----

sapply(, function(x) round((is.(x))/nrow()*100,1))

## 4pts: calculate the % missing per row and save it in a new column "NA_per_row" ----

flying_df$ <- apply(, MARGIN = 1, function() round(sum(is.na(x))/ncol()*100,1))

## View the summary of the data again

summary(flying_df)

## 5pts: create bar chart for the count (on the y) of % missing per row (on the x). be sure to give it a title ----

%>%

group_by(NA_per_row) %>%

summarise(count = ) %>%

ggplot(aes(as.factor(),)) +

geom_col(aes(fill=count)) +

geom_text(aes(label = count),size=3.5, position = position_stack(vjust = 0.5)) +

labs(title = , subtitle="Raw Data", x = "NA per row (%)")

## If a row is mostly blank, we can't learn much about that person...so let's clean up (by removing)...

##...any row that has more than 70% missing values

## 3pts: Create new version of flying_df, called flying_clean, ... ----

##...that only keeps rows with less than or equal to 70% missing data

flying_clean <- %>%

filter( <= )

## 1pt: How many rows were just removed? How many rows remain? ----

# rows removed

# rows remain

## 1pt: View the graph again for counts (y) of % missing per row (x), but now of the flying_clean object ----

%>%

group_by(NA_per_row) %>%

summarise(count = ) %>%

ggplot(aes(as.factor(),)) +

geom_col(aes(fill = count)) +

geom_text(aes(label = count),size=3.5, position = position_stack(vjust = 0.5)) +

labs(title = ,subtitle="Clean Data", x = "NA per row (%)")

## Now let's see which columns have the most missing values still so we can come up with a cleaning plan

## 3pts: output a sorted vector (from high to low) of % missing values per column in flying_clean ----

sort(sapply(, function(x) round(sum(is.na())/nrow()*100,1)), decreasing = T)

## 1pt: which column has the majority of the missing values? ----

## Seems like the personal/demongraphic information is what we are missing the most.

## Given that this data is from a survey, let's clean up the survey question responses first.

## We're going to assume that there is actually some useful information to be had about those who...

##...skip questions. So instead of replacing the NAs with "artificial" data, let's label the NA's...

##...as "No Response" to perserve the data while still handling the NA problem.

## 3pts: Create the clean_qs obect where NAs are replace with "No Response" in every column EXCEPT: ----

##...household_income, education, location, gender, age, and children_under_18

clean_qs<- flying_clean %>%

select(!c(, , , , , )) %>%

select(names(.[sapply(., anyNA)])) %>% #this line is selecting any column that has an NA

mutate_all(as.character) %>%#in order to add our new level we need to first convert these cols to text

replace(., is.na(.), ) %>%#replace any NAs with "No Response"

mutate_all(as.)#convert the columns back to factors

## 2pts: overwrite the matching columns in flying_clean, with our clean_q columns ----

[,colnames(clean_qs)] <-

## Check the % missing now with these changes (same code as before) ----

sort(sapply(, function(x) round(sum(is.na())/nrow()*100,1)), decreasing = T)

## For our other columns, let's impute (assign/replace) new values based on the other columns we have.

## We're going to this becuase we'd like to assume the values aren't just missing at complete random,

## and we might be able to figure out what the values shouldbe using the other complete cases we have.

## (See MAR example here: https://uvastatlab.github.io/2019/05/01/getting-started-with-multiple-imputation-in-r/)

## To this, we're going to use the mice package (Multivariate Imputation by Chained Equations)

## 1pt: install and load the mice package ----

install.packages()

library()

## 2pts: use the mice() function to run multivariate imputation by chained equations on the flying_clean object ----

flying_mice <- (, m = 3)#note: this may take 5-10 min to run

## 2pts: use the complete() function on the flying_mice object to extrac the completely clean dataframe ----

flying_mice_df <- (, 1)

## Output a sorted list of % NAs per row of the flying_mice_df object ----

sort(sapply(, function(x) round(sum(is.na())/nrow()*100,1)), decreasing = T) #should return all 0s

# No more missing data!

summary(flying_mice_df)

## From here we could explore our data with more visuals and ultimately build a predictive model!

## Though we'll save that for next time!

## Up to 10 pts extra credit for any exploratory graphs using the clean data -----

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!

## 1pt: Comments entered correctly above (you removed the ' ' text) ---- ## 2pts: Install and Load the tidyverse and fivethirtyeight packages ---- install.packages(" ") install.packages(" ") library(...

Give Correct ANSWERS Human-Computer Interaction (a) If you had been one of the original inventors of the WIMP interface, and engineers on the technical team had been sceptical about the advantages...

USING R --- output: html_document: default pdf_document: default --- --- title: 'Home Equity Loan Customer Profiling' subtitle: ' BUA684 Module 1' author: - FirstName LastName date: "`r...

Can someone please write the following R codes to solve these three questions. If the R code is correct and gives the correct answer I will give thumbs up. Please answer all three questions. 1. Learn...

--- output: pdf_document: default html_document: default --- --- title: 'Twitter Hashtags Basket Analysis' subtitle: 'BUA684 Module 1' author: - FirstName LastName date: "`r format(Sys.time(), '%d %B...

proportion _ win iv . What value does the proportion of wins move towards as the number of games played increases? How does this value compare to the probability of winning? [ 3 ] flongrightarrow v ....

**PLEASE PROVIDE ANSWERS TO 8,9,10,11 ** --- output: pdf_document: default html_document: default --- --- title: 'Home Equity Loan Customer Pre-screen and Scoring' subtitle: 'UMaine BUA684 Module 3'...

5. For question 5, you will use a subset of the cars data. Run the following R codes and use the cars2 data set to answer the questions. Include your R codes and output for the following questions....

5. For questiim 5, you will use a subset of the cars data. Run the following R codes and use the cara2 data set to answer the questions. Include your R codes and output for the following questions. R...

How much would the $1,000,000 be worth 20 years from now in today's dollars, if inflation is 3%?

Why must efforts to improve quality lead to improvements in both productivity and profits?

Treasury shares are most often reported as a | nj: 2 2 Multiple Choice 0 0 . 4 4 . 4 8 expense in the income statement. reduction of total paid - in capital. reduction of retained earnings. reduction...

Selected debt investment transactions for Easy A Inc., a retail business, are listed below. Easy A Inc. has a fiscal year ending on December 31. Year 1: Feb. 1 Bought $35,000 of 6%,XYZ Co. 12-year...