Question: Q . The previous codes fit a model using the entire golf data set. The next set of code splits the data set in a

.

The previous codes fit a model using the entire golf data set. The next set of code splits the data set in a training and validation set

(50 / 50)

and provides the MSE metric for both sets. What does the test MSE suggest that the appropriate number of predictors should be

?

Are there any signs of overfitting based on the graphic? What happens if you change the split to

80 / 20 ?

explain the graph which is in image and please read the question very carefully

` ` ` {

}

library

(

caret

)

set.seed

(1234)

trainIndex

-

createDataPartition

(

golf

2

$AvgWinnings,p

= . 8,

list

=

)

#p: proportion of data in train

training

-

golf

2 [

trainIndex

,]

validate

-

golf

2 [-

trainIndex,

]

fwd

.

train

=

regsubsets

(

log

(

AvgWinnings

)

.,

data

=

training,method

=

"forward",nvmax

= 20)

#Creating a prediction function

predict.regsubsets

=

function

(

object

,

newdata

,

, . . .) {

form

=

.

formula

(

object$call

[[2]])

mat

=

model.matrix

(

form

,

newdata

)

coefi

=

coef

(

object

,

=

)

xvars

=

names

(

coefi

)

mat

[,

xvars

] % * %

coefi

}

valMSE

-

()

#note my index, i

,

is to

20

since that is how many predictors I went up to during fwd selection

for

(

i in

1

20) {

predictions

-

predict.regsubsets

(

object

=

fwd

.

train,newdata

=

validate,id

=

)

valMSE

[

] -

mean

((

log

(

validate$AvgWinnings

) -

predictions

)^2)

}

par

(

mfrow

=

(1, 1))

plot

(1

20,

sqrt

(

valMSE

),

type

= "

",

xlab

= "

# of predictors",ylab

=

"test vs train RMSE",ylim

=

(0.3, . 9))

index

-

which

(

valMSE

= =

min

(

valMSE

))

points

(

index

,

sqrt

(

valMSE

[

index

]),

col

=

"red",pch

= 10)

trainMSE

-

summary

(

fwd

.

train

)

$rss

/

nrow

(

training

)

lines

(1

20,

sqrt

(

trainMSE

),

lty

= 3,

col

=

"blue"

)

Q . The previous codes fit a model using the

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!

Nave Bayes and Decision tree Classification Part A: Naive Bayes Classification Using RapidMiner In this part, we will make use of Golf sample dataset provided with RapidMiner. You can drag and drop...

use the code r Script below to Answer the questions from number 3 to 7 Questions : 3. Model #1 - First Logistic Regression Model Reporting Results Report the results of the regression model. Address...

Problem 2: Census Dataset In Problem 2, you ou will be using census data from 1994 to attempt to predict whether or not a person has an annual salary greater than $50,000 based on other information...

Question 1 *Multiplexer CPDs. What is the form of the independence that is implied by the multiplexer CPD and that we used in our derivation of the posterior over the parameters of the simple...

In this exercise, we will predict the number of applications received using the other variables in the College data set. (a) Split the data set into a training set and a test set. (b) Fit a linear...

In this exercise we will predict the number of applications received using the other variables in the College data set. Please use R code for answers! a. split the data set into training set and test...

Linear Model Selection and Regularization You use the glmnet package to perform lasso regression. parsnip does not have a dedicated function to create a ridge regression model specification. You need...

The Dataset The dataset we will be using is called MNIST. This is a large collection of handdrawn digits 0-9 and is a good dataset to learn image cla requires little to no preprocessing, The dataset...

Please use R to do the following questions. Thank you so much. If you can, please do all the questions with code and your anlysis. In this exercise, we will predict the number of applications...

1. The body fat percentage of a sample of n = 252 men was measured, along with other anatom- ical covariates. These data are contained in the data frame fat from the faraway package. he variable siri...

If FA = 40 kN and FB = 35 kN, determine the magnitude of the resultant force and specify the location of its point of application (x, y) on the slab. 30 kN 90 kN 0.75 m FE 12.5 m | 20 kN 2.5 m 0.75 m...

The following excerpts are taken from the annual report of J. Crew Group, Inc. At January 29, 2005 and January 28, 2006, 92,800 shares of Series A preferred stock and 32,500 shares of Series B...

10 The diagram shows three points: A, B and C. The diagram is not drawn accurately. a Write the bearing of B from A. b Work out the bearing of A from B. C Show that the bearing of C from B is 110. d...

Pharoah Company reported the following amounts for 2022: Raw materials purchased $95,200 Beginning raw materials inventory 5,824 Ending raw materials inventory 5,040 Beginning finished goods...