Question: - - - title: Finish the Incomplete Code output: html _ notebook - - - Libraries: ` ` ` { r } library ( ISLR

- - -

title: "Finish the Incomplete Code"

output: html

_

notebook

- - -

Libraries:

` ` ` {

}

library

(

ISLR

2)

library

(

tree

)

library

(

tidyverse

)

library

(

caret

)

library

(

dplyr

)

library

(

factoextra

)

` ` `

First,set your seed for reproducibility:

` ` ` {

}

# set your seed to

0412

set.seed

(0412)

head

(

Auto

)

` ` `

We will be using the Auto dataset for the ISLR

2

package

Read in the data and check

10

random observations

` ` ` {

}

data

(

Auto

)

sample

_

(

Auto

, 10)

` ` `

Are there any missing values?:

` ` ` {

}

#Check for missing values:

.

(

Auto

)

` ` `

What are the dimensions of your data?:

` ` ` {

}

#Check and report dimensions

dim

(

Auto

)

` ` `

1

:Decision Trees

Regression Tree: Our goal will be to predict how many cylinders in a car

First we drop the character features:

` ` ` {

}

# Check if each column is character type

char

_

columns

< -

sapply

(

Auto

,

.

character

)

# Print the results

(

char

_

columns

)

` ` `

` ` ` {

}

New

_

Auto

< -

select

_

(

Auto

,

function

(

)!

.

character

(

))

` ` `

` ` ` {

}

summary

(

New

_

Auto

)

` ` `

Create training and Testing data: Complete the code

` ` ` {

}

#Create a

60 / 40

training and testing split

intrain

< -

sample

(2

:nrow

(

New

_

Auto

), 0.6 *

nrow

(

New

_

Auto

))

train

_

data

< -

New

_

Auto

[

intrain

,]

test

_

data

< -

New

_

Auto

[-

intrain

,]

` ` `

` ` ` {

}

dim

(

train

_

data

)

` ` `

` ` ` {

}

summary

(

train

_

data

)

` ` `

` ` ` {

}

dim

(

test

_

data

)

` ` `

` ` ` {

}

summary

(

test

_

data

)

` ` `

Create a regression tree using only the training data:

` ` ` {

}

# Identify factor predictors with more than

32

levels

factor

_

predictors

< -

sapply

(

train

_

data, is

.

factor

)

levels

_

count

< -

sapply

(

train

_

data

[

factor

_

predictors

],

function

(

)

length

(

levels

(

)))

high

_

levels

< -

names

(

levels

_

count

[

levels

_

count

> 32])

# Convert factor predictors with more than

32

levels to numeric

train

_

data

_

numeric

< -

train

_

data

train

_

data

_

numeric

[

high

_

levels

] < -

lapply

(

train

_

data

_

numeric

[

high

_

levels

],

.

numeric

)

#Create a tree using the

`

tree

() `

function

TREE

< -

tree

(

mpg ~

.,

data

=

train

_

data

_

numeric

)

# Look at a summary of your tree

summary

(

TREE

)

` ` `

How many nodes does it have?

# It has

8

nodes.

Which variables did it find important?

# Weight, Horsepower, and Year.

Now plot your tree:

` ` ` {

}

plot

(

TREE

)

text

(

TREE

,

pretty

= 0)

` ` `

Lets check it:Complete the code

` ` ` {

}

# Identify factor predictors with more than

32

levels

factor

_

predictors

< -

sapply

(

train

_

data, is

.

factor

)

levels

_

count

< -

sapply

(

train

_

data

[

factor

_

predictors

],

function

(

)

length

(

levels

(

)))

high

_

levels

< -

names

(

levels

_

count

[

levels

_

count

> 32])

# Convert factor predictors with more than

32

levels to numeric

train

_

data

_

numeric

< -

train

_

data

train

_

data

_

numeric

[

high

_

levels

] < -

lapply

(

train

_

data

_

numeric

[

high

_

levels

],

.

numeric

)

# Remove 'name' variable from the dataset

train

_

data

_

numeric

< -

train

_

data

_

numeric

[,!

names

(

train

_

data

_

numeric

) %

%

"name"

]

# Create a tree using the

`

tree

() `

function

TREE

< -

tree

(

mpg ~

.,

data

=

train

_

data

_

numeric

)

# Look at a summary of your tree

summary

(

TREE

)

` ` `

` ` ` {

}

TREE

_

hat

< -

predict

(

TREE

,

newdata

=

test

_

data

)

mean

((

TREE

_

hat

-

test

_

data$mpg

)^2)

` ` `

` ` ` {

}

str

(

train

_

data

)

` ` `

Lets try random forest with m

= 4

and ntree

= 5

: Complete the code

(

remember we are predicting cylinders

)

` ` ` {

}

# Convert categorical variables to factors

train

_

data

< -

lapply

(

train

_

data, function

(

) {

(

.

factor

(

))

< -

.

factor

(

)

})

# Perform one

-

hot encoding

train

_

data

_

encoded

< -

model.matrix

(

. - 1,

data

=

train

_

data

)

# Fit random forest model

ForestAuto

< -

randomForest

(

cylinders ~

.,

data

=

train

_

data, mtry

= 4,

importance

=

TRUE, ntree

= 5)

# Print the random forest model

ForestAuto

` ` `

Let's check it

!

Complete the code

` ` ` {

}

Forest

_

hat

< -

predict

()

mean

((

Forest

_

hat

-

test

_

data

[, 2])^2)

` ` `

Which one was better between a simple regression tree and random forest?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

I need help with my Computer Science 2 project. The stdafx.h file is the only one I need help with. When I use stdafx.h for one of my headers, the program terminates. Develop a high-quality,...

Assignment I: Data Exploration & Visualization For this assignment, you will use the file lobsterland.csv, which can be found on our course Blackboard page. Once you have completed this assignment,...

Use Python: One of the goals of the Cliff Note Generator was to generate a list of characters in a novel. We can actually use our current skill set and include the techniques discussed in the nGrams...

Your goal is to create a to-do-list and associated operations using linked-lists in python. The To-Do list should have the capability of: Adding/Creating a task Assigning a task-Id Removing a task...

Need help with a small music recommendation system project in python? Music Recommendation System Milestone 1 Problem Definition The context: Why is this problem important to solve? The objectives:...

Financial Statement Analysis Obtain a copy of the 2009 annual report for a publicly held firm. Try to obtain an annual report for a retail or manufacturing firm. Be sure that the financial statement...

What is the difference between MouseListener and MouseAdapter? [3 marks] (b) Via suitable HTML, the compiled version of the following Java code is presented to the appletviewer application: import...

need help with these questions please MyCampus Real Estate Law and Conveyancing II Quiz navigation A 5 6 Question 1 If there is more than one mortgage against a property, their priority is based on...

MyCampus Real Estate Law and Conveyancing II Quiz navigation 2 3 4 5 6 Question 21 Bunny Bonnie has a new mortgage with ABC Finance Inc. for $100,000 at an interest rate of 7.2% with a monthly...

= - 5 1 AaBbceDDED new features. Later Geargia 14 A Aa PastB IU XX Are A = 3 AaB Code Ne Spatineg AaBbCcDc Aabbccdet Aa a ting 4 vit P7-7A The bank portion of last month's bank reconciliation for Yap...

Let S : Rp ! Rn and T : Rn ! Rm be linear transformations. Show that the mapping x ! T(S(x)) is a linear transformation (from Rp toRm).

The Bennet Company purchases one of its essential raw materials from three suppliers. Rennets current policy is so distribute purchases equally among the three. The owners son, Benjamin Rennet, just...

Which of the following is a HIPAA - associated business associate? A web designer is hired to maintain the facility s website view and hours of operation, and list the services provided. A company is...

aluate x 30x 30y dA where D is the shaded region enclosed by the lemniscate curve sin 20 in the figure 0 5 r sin 20 X 0 5 se symbolic notation and fractions where needed