Question: CV by hand We'll start with a simulated example. The code chunk below imports data that is non-linear and shows increasing variance as the predictor
CV "by hand" We'll start with a simulated example. The code chunk below imports data that is non-linear and shows increasing variance as the predictor increases. I like to use this setting because "model complexity" is easiest for me to understand when I can see it. However, "model complexity" is also an issue when you're dealing with lots of predictors - you can't "see" overfitting as easily, but it definitely happens. data("lidar") lidar_df = lidar |> as_tibble() |> mutate(id = row_number()) lidar_df |> ggplot(aes(x = range, y = logratio)) + geom_point() I'll split this data into training and test sets (using anti_join!!), and replot showing the split. Our goal will be to use the training data (in black) to build candidate models, and then see how those models predict in the testing data (in red). train_df = sample_frac(lidar_df, size = .8) test_df = anti_join(lidar_df, train_df, by = "id") ggplot(train_df, aes(x = range, y = logratio)) + geom_point() + geom_point(data = test_df, color = "red")
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
