Question: Using the auto data set and using the scikit learn library 2 . Create and add a binary variable column called mpg _ high _

Using the auto data set and using the scikit learn library

2 .

Create and add a binary variable column called mpg

_

high

_

low to the dataset that is set to High if mpg is a value above

30,

and a Low if mpg is a value less than or equal to

30 .

Make sure the mpg

_

high

_

low column is of type category.

3 .

Check if the auto data is imbalanced with respect to mpg

_

high

_

low. Report the percentage of the data that belong to the two classes

(

High and Low

) .

4 .

Split the dataset into

75 %

training and

25 %

test and use

10

fold cross validation for the models below

5 .

Fit a logistic regression model to the training set to predict mpg

_

high

_

low using all the other features

/

variables except mpg

,

year, origin, and name. Predict the mpg

_

high

_

low using the test dataset and report the Accuracy, Precision, Recall, Specificity, and F

1

measure.

6 .

Alter the threshold for classifying a Low to

0.6

and report the changes in the test performance metrics from those reported in Qn

5 .

7 .

Find the optimal threshold by drawing the ROC curve. Change the threshold to the optimal value you found from the ROC curve and report the changes in the test performance metrics from those reported in Qn

5 .

8 .

Fit a Na

ve Bayes model to the training data to predict mpg

_

high

_

low using all the other features

/

variables except mpg

,

year, origin, and name. Predict the mpg

_

high

_

low using the test dataset. Plot the ROC curve and report the best threshold on the ROC curve plot. Report the AUC on the curve plot as well. Report the accuracy, precision, recall, specificity and F

1

score.

9 .

Fit a KNN model to the training data to predict mpg

_

high

_

low using all the other features

/

variables except mpg

,

year, origin, and name. Use a grid search between

3

and

10

to find the best value of k

.

Report the accuracy, precision, recall, specificity, F

1

score and AUC.

10 .

Fit a LDA model to the training data to predict mpg

_

high

_

low using all the other features

/

variables except mpg

,

year, origin, and name. Report the accuracy, precision, recall, specificity and F

1

score.

11 .

Summarize the performance of the all the above models by creating a dataframe with

4

columns

Model

_

Name, Accuracy, Precision, Recall, Specificity, F

1

Score. The data frame should contain one row for each model you built above with each of the columns filled in with the appropriate metric. Print out the dataframe. Which model performed the best from an accuracy point of view and which model performed best from a recall point of view without adjusting for the threshold?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

How to fix the attached error. Using the auto data set and using the scikit learn library 2 . Create and add a binary variable column called mpg _ high _ low to the dataset that is set to High if mpg...

This question concerns lexical grammars. (a) Tree Adjoining Grammars contain two types of elementary tree. (i) What are these trees called? [1 mark] (ii) If one were building a grammar for English...

(a) In SystemVerilog, what is the difference between: (i) The ternary operator ? and if...then...else statements? [2 marks] (ii) always_ff and always_comb? [2 marks] (iii) Blocking, non-blocking and...

Homework 4 Use the scikit learn library for all the models except when mentioned to use another library. Review examples provided on Blackboard before attempting homework. For most of the questions...

This question involves the use of AGGREGATE linear PYTHOIN regression on the Auto data set. (a) Perform a simple linear regression with mpg as the response and horsepower as the predictor. Describe...

Please use R Programming for this question and take screenshots of R Code once finished. Data to use for question 1: https://drive.google.com/file/d/18kGNrHUfgcVv2hMKqL5E05L40xCl6e1M/view?usp=sharing...

using Rstudio 5. In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set. (a) Create a binary variable...

Use the jcpd-calls-for-service.csv file found on Canvas for the following exercise. The file contains data on the service calls received by the Jersey City NJ police department. ## Part 1 - Reading...

Python: Titanic Data with pandas Q1-1 . use read_csv function to read in the data as df Q1-2 . Verify that you get 891 lines of data (print the number of rows) Q1-3 . print the first ten (head...

Use the scikit learn library for all the models if possible Using the Auto data set 6 . Alter the threshold for classifying a Low to 0 . 6 and report the changes in the test performance metrics from...

Rainfalls occur randomly and independently over the course of Metro Manila. The average is 8 rainfalls every month. What is the probability of at most one rainfall in a month?

As light from a star spreads out and weakens, do gaps form between the photons?

6. What is earnings management?

(Geometry: gift-wrapping algorithm for finding a convex hull) Section 22.10.1 introduced the gift-wrapping algorithm for finding a convex hull for a set of points. Assume that the Javas coordinate...