Question: In this problem, you will develop a model to predict whether a person in the US Census earns more than $ 5 0 K or

In this problem, you will develop a model to predict whether a person in the US Census earns more than $50K or not. Consider Income as the target variable and include Age, MaritalStatus, Race, Sex, and WeeklyHours as predictors. We use the Census dataset for this. Use a QDA model. Use the previously created 5 folds for the cross-validation on the training set.
Calculate and show the confusion matrix for both the training and the test set. What is the performance with respect to qda_model <-
# specify that the model is a quadratic discriminant analysis
discrim_quad()%>%
# note: there are several potential engines for QDA, here we just use the default one
set_engine("MASS")%>%
# select the binary classification mode
set_mode("classification")
# then, let's put everything into a workflow
qda_workflow <- workflow()%>%
# add the recipe (data pre-processing)
add_recipe(model_recipe)%>%
# add the ML model
add_model(qda_model)
set.seed(1)
control <- control_resamples(save_pred = TRUE,
event_level = "second")
qda_fit <-
qda_workflow %>%
fit(data = data_train)
# investigate the result
qda_fit
# to get the evaluation metrics for the test data:
qda_final_fit <-
qda_workflow %>%
last_fit(data_split) # with the fit function, we train the model on the training data
# note that we use the test data here!
test_predictions_qda <-
qda_final_fit %>%
augment()
test_predictions_qda$Income <- as.factor(test_predictions_qda$Income)
# note: you need to select the truth and estimate variables based on the column names of the test object
classification_metrics(data = test_predictions_qda,
truth = Income,
estimate =.pred_class,
`.pred_>50K`, # use the second outcome (Yes) as the level of interest
event_level = 'second') # note: the "second" indicates that we use the second class (AHD = Yes) as the level of interest
# finally, let's create the confusion matrix and ROC curve
confusionMatrix(data = test_predictions_qda$.pred_class,
reference = test_predictions_qda[[target_var]],
positive = positive_class)
two_class_curve_test_qda <- roc_curve(data = test_predictions_qda,
truth = Income,
`.pred_>50K`,
event_level = 'second')
autoplot(two_class_curve_test_qda), sensitivity, and specificity, and AUC? Create and print the ROC curves. I am using a five fold cross validation.
******How would I create the confusion matrix for the training set?************

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!