Question: ( 2 2 points ) In this next question, we'll use KNN to try to classify players' preferred foot. ( a ) ( 1 point

(22 points) In this next question, we'll use KNN to try to classify players' "preferred foot."
(a)(1 point) First, let's get a better sense of the balance of classes in our data (eg, how
many observations of each class we have). Display the count of each value present in the
preferred foot column.
(b)(2 points) If we were to build a classifier which always guessed a player preferred their
right foot, what percentage of the time would we make a correct classification? In other
words, what percent of players actually do prefer their right foot?
(c)(1 point) Let's build a classifier using 10 available dimensions: shooting, passing, drib-
bling, defending, attacking, skill, movement, power, mentality, and goalkeeping. Create
an x dataframe with just these 10 columns and display the first 5 rows.
(d)(2 points) Now, rescale (or normalize) this x data so that each IV has a mean of 0 and
a standard deviation of 1. Display (at least) the first three rows of normalized data.
(e)(2 points) We'll want to be able to see how well our classifier performs out of sample,
so now create a validation-train split, setting Y to be the "preferred foot" column of the
dataframe. Here, use 30% of the data for validation and set the random state to 456.
Display (at least) the first 3 rows of x training data.
(f)(4 points) Next, we'll want to determine the number of neighbors k to consider for our
KNN classifier. For values of k from 1-30(inclusive), calculate either the error or the
accuracy of a KNN classifier. Display your results by creating a plot with considered
k values along the horizontal axis and the corresponding error (or accuracy) displayed
along the vertical axis.
(g)(4 points) Based on your analysis, choose a reasonable value of k. Fit a KNN classifier
that considers this number of neighbors and predict Y values (preferred foot) for your
out of sample validation data. Display (at least) the first 3 predictions for "preferred
foot."
(h)(2 points) Use actual and predicted Y values to calculate and display the confusion
matrix for your model. This will display without labels, but will show the classes in
alphabetical order (Left, Right; upper left corner is "Left-Left"). As with the examples
in lecture, the rows will indicate the actual values and the columns will indicate the
predicted values. Approximately how many players who actually prefer their left foot
("True Lefts") were predicted to prefer their right foot?
(i)(2 points) Use the actual and predicted Y values to display the full classification report.
What does the recall for the classification "Left" suggest about our model?
(j)(2 points) Reflecting on the analysis above, do you feel like this model does a good job
or a bad job of predicting a player's preferred foot? Briefly explain your answer.
( 2 2 points ) In this next question, we'll use

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!