Question: Problem 1: effect of sample size Generate training datasets withnObs=25,100and500observations such that two variables are associated with the outcome as parameterized above and three are
Problem 1: effect of sample size
Generate training datasets withnObs=25,100and500observations such that two variables are associated with the outcome as parameterized above and three are not associated and average difference between the two classes is the same as above (i.e.in the notation from the above codenClassVars=2,nNoiseVars=3anddeltaClass=1). Obtain random forest, LDA and KNN test error rates on a (for greater stability of the results, much larger, say, with 10K observations) test dataset simulated from the same model. Describe the differences between different methods and across the sample sizes used here.
The following example below illustrates the main ideas on a 3D dataset with two of the three attributes associated with the outcome:
# How many observations:
nObs <- 1000
# How many predictors are associated with outcome:
nClassVars <- 2
# How many predictors are not:
nNoiseVars <- 1
# To modulate average difference between two classes' predictor values:
deltaClass <- 1
# Simulate training and test datasets with an interaction
# between attribute levels associated with the outcome:
xyzTrain <- matrix(rnorm(nObs*(nClassVars+nNoiseVars)),nrow=nObs,ncol=nClassVars+nNoiseVars)
xyzTest <- matrix(rnorm(10*nObs*(nClassVars+nNoiseVars)),nrow=10*nObs,ncol=nClassVars+nNoiseVars)
classTrain <- 1
classTest <- 1
for ( iTmp in 1:nClassVars ) {
deltaTrain <- sample(deltaClass*c(-1,1),nObs,replace=TRUE)
xyzTrain[,iTmp] <- xyzTrain[,iTmp] + deltaTrain
classTrain <- classTrain * deltaTrain
deltaTest <- sample(deltaClass*c(-1,1),10*nObs,replace=TRUE)
xyzTest[,iTmp] <- xyzTest[,iTmp] + deltaTest
classTest <- classTest * deltaTest
}
classTrain <- factor(classTrain > 0)
table(classTrain)
# plot resulting attribute levels colored by outcome:
pairs(xyzTrain,col=as.numeric(classTrain))
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
