Question: - - - title: Finish the Incomplete Code output: html _ notebook - - - Libraries: ` ` ` { r } library ( ISLR

---
title: "Finish the Incomplete Code"
output: html_notebook
---
Libraries:
```{r}
library(ISLR2)
library(tree)
library(tidyverse)
library(caret)
library(dplyr)
library(factoextra)
```
First,set your seed for reproducibility:
```{r}
# set your seed to 0412
set.seed(0412)
head(Auto)
```
We will be using the Auto dataset for the ISLR2 package
Read in the data and check 10 random observations
```{r}
data(Auto)
sample_n(Auto,10)
```
Are there any missing values?:
```{r}
#Check for missing values:
is.na(Auto)
```
What are the dimensions of your data?:
```{r}
#Check and report dimensions
dim(Auto)
```
# 1:Decision Trees
Regression Tree: Our goal will be to predict how many cylinders in a car
First we drop the character features:
```{r}
# Check if each column is character type
char_columns <- sapply(Auto, is.character)
# Print the results
print(char_columns)
```
```{r}
New_Auto <- select_if(Auto, function(x)!is.character(x))
```
```{r}
summary(New_Auto)
```
Create training and Testing data: Complete the code
```{r}
#Create a 60/40 training and testing split
intrain <-sample(2:nrow(New_Auto),0.6* nrow(New_Auto))
train_data <- New_Auto[intrain ,]
test_data <- New_Auto[-intrain ,]
```
```{r}
dim(train_data)
```
```{r}
summary(train_data)
```
```{r}
dim(test_data)
```
```{r}
summary(test_data)
```
Create a regression tree using only the training data:
```{r}
# Identify factor predictors with more than 32 levels
factor_predictors <- sapply(train_data, is.factor)
levels_count <- sapply(train_data[factor_predictors], function(x) length(levels(x)))
high_levels <- names(levels_count[levels_count >32])
# Convert factor predictors with more than 32 levels to numeric
train_data_numeric <- train_data
train_data_numeric[high_levels]<- lapply(train_data_numeric[high_levels], as.numeric)
#Create a tree using the `tree()` function
TREE <- tree(mpg ~ ., data = train_data_numeric)
# Look at a summary of your tree
summary(TREE)
```
How many nodes does it have?
# It has 8 nodes.
Which variables did it find important?
# Weight, Horsepower, and Year.
Now plot your tree:
```{r}
plot(TREE)
text(TREE, pretty =0)
```
Lets check it:Complete the code
```{r}
# Identify factor predictors with more than 32 levels
factor_predictors <- sapply(train_data, is.factor)
levels_count <- sapply(train_data[factor_predictors], function(x) length(levels(x)))
high_levels <- names(levels_count[levels_count >32])
# Convert factor predictors with more than 32 levels to numeric
train_data_numeric <- train_data
train_data_numeric[high_levels]<- lapply(train_data_numeric[high_levels], as.numeric)
# Remove 'name' variable from the dataset
train_data_numeric <- train_data_numeric[,!names(train_data_numeric)%in% "name"]
# Create a tree using the `tree()` function
TREE <- tree(mpg ~ ., data = train_data_numeric)
# Look at a summary of your tree
summary(TREE)
```
```{r}
TREE_hat <- predict(TREE, newdata = test_data )
mean((TREE_hat - test_data$mpg)^2)
```
```{r}
str(train_data)
```
Lets try random forest with m =4 and ntree =5: Complete the code (remember we are predicting cylinders)
```{r}
# Convert categorical variables to factors
train_data <- lapply(train_data, function(x){
if(is.factor(x)) x <- as.factor(x)
x
})
# Perform one-hot encoding
train_data_encoded <- model.matrix(~ .-1, data = train_data)
# Fit random forest model
ForestAuto <- randomForest(cylinders ~ ., data = train_data, mtry =4, importance = TRUE, ntree =5)
# Print the random forest model
ForestAuto
```
Let's check it!Complete the code
```{r}
Forest_hat <- predict()
mean((Forest_hat-test_data[,2])^2)
```
Which one was better between a simple regression tree and random forest?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!