Question: Q . The previous codes fit a model using the entire golf data set. The next set of code splits the data set in a

Q.The previous codes fit a model using the entire golf data set. The next set of code splits the data set in a training and validation set (50/50) and provides the MSE metric for both sets. What does the test MSE suggest that the appropriate number of predictors should be? Are there any signs of overfitting based on the graphic? What happens if you change the split to 80/20? explain the graph which is in image and please read the question very carefully
```{r}
library(caret)
set.seed(1234)
trainIndex-createDataPartition(golf2$AvgWinnings,p=.8,list=F) #p: proportion of data in train
training-golf2[trainIndex,]
validate-golf2[-trainIndex,]
fwd.train=regsubsets(log(AvgWinnings)~.,data=training,method="forward",nvmax=20)
#Creating a prediction function
predict.regsubsets =function (object , newdata ,id ,...){
form=as.formula (object$call [[2]])
mat=model.matrix(form ,newdata )
coefi=coef(object ,id=id)
xvars=names(coefi)
mat[,xvars]%*%coefi
}
valMSE-c()
#note my index, i, is to 20 since that is how many predictors I went up to during fwd selection
for (i in 1:20){
predictions-predict.regsubsets(object=fwd.train,newdata=validate,id=i)
valMSE[i]-mean((log(validate$AvgWinnings)-predictions)^2)
}
par(mfrow=c(1,1))
plot(1:20,sqrt(valMSE),type="l",xlab="# of predictors",ylab="test vs train RMSE",ylim=c(0.3,.9))
index-which(valMSE==min(valMSE))
points(index,sqrt(valMSE[index]),col="red",pch=10)
trainMSE-summary(fwd.train)$rss/nrow(training)
lines(1:20,sqrt(trainMSE),lty=3,col="blue")
Q . The previous codes fit a model using the

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!