Question: NEED HELP IN (R) # Call the ISLR library and check the head of College (a built-in data frame # with ISLR, use data() to
NEED HELP IN (R)
# Call the ISLR library and check the head of College (a built-in data frame # with ISLR, use data() to check this.) Then reassign College to a dataframe # called df code here
# EDA # Let's explore the data! # Create a scatterplot of Grad.Rate versus Room.Board, colored by the # Private column.
code here
# Create a histogram of full time undergrad students, color by Private. code here
# Create a histogram of Grad.Rate colored by Private. You should see something odd here. code here
# What college had a Graduation Rate of above 100% ? code here
# Change that college's grad rate to 100% code here
# Train Test Split # Split your data into training and testing sets 70/30. Use the caTools # library to do this.
code here
# Decision Tree # Use the rpart library to build a decision tree to predict whether or not a # school is Private. Remember to only build your tree off the training data.
code here
# Use predict() to predict the Private label on the test data. code here
# Check the Head of the predicted values. You should notice that you actually have two columns with the probabilities. code here
# Turn these two columns into one column to match the original Yes/No Label # for a Private column. code here
# Lots of ways to do this joiner <- function(x){ if (x>=0.5){ return('Yes') }else{ return("No") } } tree.preds$Private <- sapply(tree.preds$Yes,joiner) head(tree.preds)
# Now use table() to create a confusion matrix of your tree model. code here
# Use the rpart.plot library and the prp() function to plot out your tree # model.
code here
# Random Forest # Now let's build out a random forest model! # Call the randomForest package library library(randomForest)
# Now use randomForest() to build out a model to predict Private class. # Add importance=TRUE as a parameter in the model. (Use help(randomForest) # to find out what this does. code here
# What was your model's confusion matrix on its own training set? # Use model$confusion. code here
# Grab the feature importance with model$importance. Refer to the reading # for more info on what Gini[1] means.[2] code here
# Predictions # Now use your random forest model to predict on your test set! code here
# It should have performed better than just a single tree, how much better # depends on whether you are emasuring recall, precision, or accuracy as # the most important measure of the model.
#Ref: www.pieriandata.com
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
