Question: - - - title: Finish the Incomplete Code output: html _ notebook - - - Libraries: ` ` ` { r } library ( ISLR
title: "Finish the Incomplete Code"
output: htmlnotebook
Libraries:
r
libraryISLR
librarytree
librarytidyverse
librarycaret
librarydplyr
libraryfactoextra
First,set your seed for reproducibility:
r
# set your seed to
set.seed
headAuto
We will be using the Auto dataset for the ISLR package
Read in the data and check random observations
r
dataAuto
samplenAuto
Are there any missing values?:
r
#Check for missing values:
isnaAuto
What are the dimensions of your data?:
r
#Check and report dimensions
dimAuto
# :Decision Trees
Regression Tree: Our goal will be to predict how many cylinders in a car
First we drop the character features:
r
# Check if each column is character type
charcolumns sapplyAuto ischaracter
# Print the results
printcharcolumns
r
NewAuto selectifAuto functionxischaracterx
r
summaryNewAuto
Create training and Testing data: Complete the code
r
#Create a training and testing split
intrain sample:nrowNewAuto nrowNewAuto
traindata NewAutointrain
testdata NewAutointrain
r
dimtraindata
r
summarytraindata
r
dimtestdata
r
summarytestdata
Create a regression tree using only the training data:
r
# Identify factor predictors with more than levels
factorpredictors sapplytraindata, isfactor
levelscount sapplytraindatafactorpredictors functionx lengthlevelsx
highlevels nameslevelscountlevelscount
# Convert factor predictors with more than levels to numeric
traindatanumeric traindata
traindatanumerichighlevels lapplytraindatanumerichighlevels asnumeric
#Create a tree using the tree function
TREE treempg ~ data traindatanumeric
# Look at a summary of your tree
summaryTREE
How many nodes does it have?
# It has nodes.
Which variables did it find important?
# Weight, Horsepower, and Year.
Now plot your tree:
r
plotTREE
textTREE pretty
Lets check it:Complete the code
r
# Identify factor predictors with more than levels
factorpredictors sapplytraindata, isfactor
levelscount sapplytraindatafactorpredictors functionx lengthlevelsx
highlevels nameslevelscountlevelscount
# Convert factor predictors with more than levels to numeric
traindatanumeric traindata
traindatanumerichighlevels lapplytraindatanumerichighlevels asnumeric
# Remove 'name' variable from the dataset
traindatanumeric traindatanumericnamestraindatanumericin "name"
# Create a tree using the tree function
TREE treempg ~ data traindatanumeric
# Look at a summary of your tree
summaryTREE
r
TREEhat predictTREE newdata testdata
meanTREEhat testdata$mpg
r
strtraindata
Lets try random forest with m and ntree : Complete the code remember we are predicting cylinders
r
# Convert categorical variables to factors
traindata lapplytraindata, functionx
ifisfactorx x asfactorx
x
# Perform onehot encoding
traindataencoded model.matrix~ data traindata
# Fit random forest model
ForestAuto randomForestcylinders ~ data traindata, mtry importance TRUE, ntree
# Print the random forest model
ForestAuto
Let's check itComplete the code
r
Foresthat predict
meanForesthattestdata
Which one was better between a simple regression tree and random forest?
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
