Question: Help with this would be appreciated (also this should be done using r code) 1. (60%) The data set penguins.csy contains data of three different
Help with this would be appreciated (also this should be done using r code)

1. (60%) The data set penguins.csy contains data of three different species of penguins (Adelie, Gentoo, Chinstrap). Based on bill_length_mm, bill_depth_mm, flipper_length_mm, and body_mass_g we want to predict the species using logistic regression. Use and 80/20 split for your training/testing data. (a) Split the data into training/testing data (call them train_pen and test_pen respectively). Use an 80/20 split. (b) Train your model on the training data set. (c) Using your model, find predictions for the categories of the testing data (d) Construct a confusion matrix (table) to see how many were correctly /incorrectly classified. (e) Definitions (read): For the confusion matrix Positive Prediction Negative Prediction Positive Class True Positive (TP) False Negative (FN) Negative Class False Positive (FP) True Negative (TN) Table 1: Confusion Matrix i. Accuracy = Correct Predictions / Total Predictions = (TP+TN) /(TP+TN+FP+FN) ii. Error Rate = 1 - Accuracy ifi. "What proportion of positive identifications was actually correct?" Precision = TP/(TP+FP) iv. "What proportion of actual positives was identified correctly?" Recall = TP/(TP +FN) (f) Compute the Accuracy, Error Rate, Precision, and Recall for the classification just preformed
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
