Question: Consider a simulated dataset with a binary response ` y ` and $ 5 0 0 0 $ predictors measured on $n = 5 0

Consider a simulated dataset with a binary response `y` and $5000$ predictors measured on $n =50$ cases, saved as a matrix `x`.
```{r}
set.seed(1)
y <- c(rep("c1",25), rep("c2",25))
x <- matrix(NA, nrow =50, ncol =5000)
for (i in 1:5000){
x[, i]<- rnorm(50)
}
```
A simple classifier is applied to the simulated dataset, where the classification is performed in two steps:
* Step 1. Feature selection: Select $20$ predictors with smallest $p$-values from two-sample $t$-tests
* Step 2. Model fitting: Fit a linear discriminant analysis (LDA) model, using only these $20$ selected predictors.
We would like to compute the $5$-fold cross-validation (CV) estimate of test accuracy rate for the classifier.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!