Supervised Learning Algorithms ( SVM ) , YOU NEED TO WRITE A PYTHON CODE PLEASE READ THE INSTRUCTIONS CAREFULLY, AND PLEASE DO NOT SUBMIT AN ANSWER IF IT'S INCOMPLETE FIRST YOU NEED TO IMPORT THESE LIBRARIES import pandas as pd from sklearn model selection import train test split from sklearn metrics import accuracy score, classification report From sklearn metrics import roc auc score from sklearn preprocessing import StandardScaler from sklearn neighbors import KNeighborsClassifier from sklearn tree import DecisionTreeClassifier from sklearn ensemble import RandomForestClassifier from sklearn linear model import LogisticRegression THEN YOU NEED TO READ THE DATASET USING df pd read csv ( Dataset 4 csv ) df drop duplicates ( inplace True ) YOU NEED TO TAKE A LOOK ON THE FIRST TABLE AND MAKE TWO COPIES ( WE NEED TWO TABLES ONE FOR EACH PHASE , BEFORE OVERSAMPLING AND ONE AFTER OVERSAMPLING ) FOR THE FIRST PHASE TAKE THIS CODE AND JUST ADD LOGISTIC REGRESSION TO IT import pandas as pd from sklearn model selection import train test split from sklearn metrics import accuracy score from sklearn neighbors import KNeighborsClassifier from sklearn tree import DecisionTreeClassifier from sklearn ensemble import RandomForestClassifier df pd read csv ( ' Train 1 csv ' ) mean df X 1 2 mean ( ) df fillna ( X 1 2 mean , inplace True ) drop cols ' INCIDENT ID ' , 'DATE' df drop ( columns drop cols, inplace True ) X train, X test, y train, y test train test split ( df drop ( ' MALICIOUS OFFENSE', axis 'columns' ) , df ' MALICIOUS OFFENSE' , train size 0 8 0 , random state 2 ) KNN model KNeighborsClassifier ( n neighbors 1 ) model fit ( X train, y train ) pred test knn model predict ( X test ) pred train knn model predict ( X train ) print ( K Nearest Neighbors accuracy score ( test ) , accuracy score ( y test, pred test knn ) ) print ( K Nearest Neighbors accuracy score ( train ) , accuracy score ( y train, pred train knn ) ) print ( ) Decision Tree tree clf DecisionTreeClassifier ( max depth 1 6 ) tree clf fit ( X train, y train ) pred test tree tree clf predict ( X test ) pred train tree tree clf predict ( X train ) print ( Decision Tree accuracy score ( test ) , accuracy score ( y test, pred test tree ) ) print ( Decision Tree accuracy score ( train ) , accuracy score ( y train, pred train tree ) ) print ( ) Random Forest rnd clf RandomForestClassifier ( n estimators 1 2 0 , max leaf nodes 2 0 0 , n jobs 1 ) rnd clf fit ( X train, y train ) pred test rf rnd clf predict ( X test ) pred train rf rnd clf predict ( X train ) print ( Random Forest accuracy score ( test ) , accuracy score ( y test, pred test rf ) ) print ( Random Forest accuracy score ( train ) , accuracy score ( y train, pred train rf ) ) NOTE TAKE THE NUMBERS OF TRAINING AND TESTING FROM THE FIRST SCREENSHOT AND WE NEED TO DO STANDARD SCALING ON X TRAIN AND X TEST ( X TRAIN USE FIT TRANSFORM, X TEST USE TRANSFORM ONLY ) AND THEN WE NEED TO PRINT THE RESULTS 1 JUST PRINT THE ACCURACY OF THE TRAINING AND TESTING AND PRINT THE CLASSIFICATION REPORT THE CLASSIFICATION REPORT CONTAINS ( Precision , Recall,F Score ) FOR EACH CLASS IN THE LABEL ) this is before oversampling ( phase 1 ) For after oversampling U can use this code from imblearn over sampling import SMOTE import pandas as pd train df pd read csv ( train csv ) train Y train df ' attack category' train x train df drop ( ' attack category','attack type', 'protocol type','service','flag' , axis 1 ) print ( train Y value counts ( ) ) sm SMOTE ( sampling strategy 'auto', random state 0 ) train x sm , train Y sm sm fit resample ( train x , train Y ) print ( train Y sm value counts ( ) ) You just need to do scaling and print the results For the AUC in the tables we use this print statement to print it y pred DT predict proba ( x test ) print ( roc auc score ( y train, y pred, multi class 'ovr' ) ) Here is the statement for the classification report use it classification report y test, pred test tree You should submit the following 1 Python code file 2 Documentation with the description of the steps you followed and an explanation of the results you got The dataset description The Dataset csv file is a clean dataset of 2 5 3 , 6 8 0 survey responses to the CDC ' s BRFSS 2 0 1 5 The target variable Diabetes 0 1 2 has 3 classes 0 is for no diabetes or only during pregnancy, 1 is for prediabetes, and 2 is for diabetes This dataset has 2 1 features and 2 5 3 6 8 0 records The following is the description of the features table Variable Name,Type,Description , Diabetes binary, Binary, 0 no diabetes 1 prediabetes or diabetes , HighBP , Binary, 0 no high BP 1 high BP , HighChol , Binary, 0 no high cholesterol 1 high cholesterol , CholCheck , Binary , , table 0 no cholesterol check in 5 years 1 yes cholesterol check in 5 , years , BMI , Integer , Body Mass Index , Smoker , Binary , , table Have you smoked at Show all images Show all images Show all images done loading

The Answer is in the image, click to view ...

Question: Supervised Learning Algorithms ( SVM ) , YOU NEED TO WRITE A PYTHON CODE PLEASE READ THE INSTRUCTIONS CAREFULLY, AND PLEASE DO NOT SUBMIT AN

Supervised Learning Algorithms

(

SVM

),

YOU NEED TO WRITE A PYTHON CODE PLEASE READ THE INSTRUCTIONS CAREFULLY, AND PLEASE DO NOT SUBMIT AN ANSWER IF IT'S INCOMPLETE

.

FIRST YOU NEED TO IMPORT THESE LIBRARIES

import pandas as pd

from sklearn.model

_

selection import train

_

test

_

split from sklearn.metrics import accuracy

_

score, classification report From sklearn.metrics import roc

_

auc

_

score from sklearn.preprocessing import StandardScaler from sklearn.neighbors import KNeighborsClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier from sklearn. linear

_

model import LogisticRegression

THEN YOU NEED TO READ THE DATASET USING : df

=

.

read

_

csv

("

Dataset#

4 .

csv

")

.

drop

_

duplicates

(

inplace

=

True

)

YOU NEED TO TAKE A LOOK ON THE FIRST TABLE AND MAKE TWO COPIES

(

WE NEED TWO TABLES ONE FOR EACH PHASE

,

BEFORE OVERSAMPLING AND ONE AFTER OVERSAMPLING

)

FOR THE FIRST PHASE TAKE THIS CODE AND JUST ADD LOGISTIC REGRESSION TO IT

import pandas as pd

from sklearn.model

_

selection import train

_

test

_

split

from sklearn.metrics import accuracy

_

score

from sklearn.neighbors import KNeighborsClassifier

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier

=

.

read

_

csv

('

Train

1 .

csv

')

mean

=

["

_12 "] .

mean

()

.

fillna

({"

_12 "

:mean

},

inplace

=

True

)

drop

_

cols

= ['

INCIDENT

_

',

'DATE'

]

.

drop

(

columns

=

drop

_

cols, inplace

=

True

)

_

train, X

_

test, y

_

train, y

_

test

= \

train

_

test

_

split

(

.

drop

('

MALICIOUS

_

OFFENSE', axis

=

'columns'

),

['

MALICIOUS

_

OFFENSE'

],

train

_

size

= 0.80,

random

_

state

= 2)

############################# KNN #########################################

model

=

KNeighborsClassifier

(

_

neighbors

= 1)

model.fit

(

_

train, y

_

train

)

pred

_

test

_

knn

=

model.predict

(

_

test

)

pred

_

train

_

knn

=

model.predict

(

_

train

)

("

-

Nearest Neighbors accuracy score

(

test

)

",

accuracy

_

score

(

_

test, pred

_

test

_

knn

))

("

-

Nearest Neighbors accuracy score

(

train

)

",

accuracy

_

score

(

_

train, pred

_

train

_

knn

))

()

######################### Decision Tree ################################

tree

_

clf

=

DecisionTreeClassifier

(

max

_

depth

= 16)

tree

_

clf

.

fit

(

_

train, y

_

train

)

pred

_

test

_

tree

=

tree

_

clf

.

predict

(

_

test

)

pred

_

train

_

tree

=

tree

_

clf

.

predict

(

_

train

)

("

Decision Tree accuracy score

(

test

)

",

accuracy

_

score

(

_

test, pred

_

test

_

tree

))

("

Decision Tree accuracy score

(

train

)

",

accuracy

_

score

(

_

train, pred

_

train

_

tree

))

()

######################### Random Forest ################################

rnd

_

clf

=

RandomForestClassifier

(

_

estimators

= 120,

max

_

leaf

_

nodes

= 200,

_

jobs

= - 1)

rnd

_

clf

.

fit

(

_

train, y

_

train

)

pred

_

test

_

=

rnd

_

clf

.

predict

(

_

test

)

pred

_

train

_

=

rnd

_

clf

.

predict

(

_

train

)

("

Random Forest accuracy score

(

test

)

",

accuracy

_

score

(

_

test, pred

_

test

_

))

("

Random Forest accuracy score

(

train

)

",

accuracy

_

score

(

_

train, pred

_

train

_

))

NOTE TAKE THE NUMBERS OF TRAINING AND TESTING FROM THE FIRST SCREENSHOT

AND WE NEED TO DO STANDARD SCALING ON X TRAIN AND X TEST

(

X TRAIN USE FIT.TRANSFORM, X TEST USE TRANSFORM ONLY

)

AND THEN WE NEED TO PRINT THE RESULTS

1 .

JUST PRINT THE ACCURACY OF THE TRAINING AND TESTING AND PRINT THE CLASSIFICATION REPORT THE CLASSIFICATION REPORT CONTAINS

(

Precision

,

Recall,F

-

Score

)

FOR EACH CLASS IN THE LABEL

)

this is before oversampling

(

phase

1)

* * *

For after oversampling

U can use this code

from imblearn.over

_

sampling import SMOTE

import pandas as pd

train

_

=

.

read

_

csv

("

train

.

csv

")

train

_

=

train

_

['

attack

_

category'

]

train

_

=

train

_

.

drop

(['

attack

_

category','attack

_

type', 'protocol

_

type','service','flag'

],

axis

= 1)

(

train

_

.

value

_

counts

())

=

SMOTE

(

sampling

_

strategy

=

'auto', random

_

state

= 0)

train

_

_

,

train

_

_

=

.

fit

_

resample

(

train

_

,

train

_

)

(

train

_

_

.

value

_

counts

())

You just need to do scaling and print the results

For the AUC in the tables we use this print statement to print it

_

pred

-

.

predict

_

proba

(

_

test

)

(

roc

_

auc

_

score

(

_

train, y

_

pred, multi

_

class

-

'ovr'

))

Here is the statement for the classification report use it : classification

_

report

_

test, pred

_

test

_

tree

You should submit the following:

1 -

Python code file

2 -

Documentation with the description of the steps you followed and an explanation of the results you got.

The dataset description

The Dataset.csv file is a clean dataset of

253, 680

survey responses to the CDC

'

s BRFSS

2015 .

The target variable Diabetes

_012

has

3

classes.

0

is for no diabetes or only during pregnancy,

1

is for prediabetes, and

2

is for diabetes.

This dataset has

21

features and

253680

records. The following is the description of the features:

\

table

[[

Variable Name,Type,Description

], [

Diabetes

_

binary,

Binary,

0 =

no diabetes

1 =

prediabetes or diabetes

], [

HighBP

,

Binary,

0 =

no high BP

1 =

high BP

], [

HighChol

,

Binary,

0 =

no high cholesterol

1 =

high cholesterol

], [

CholCheck

,

Binary

,, \

table

[[0 =

no cholesterol check in

5

years

1 =

yes cholesterol check in

5], [

years

]]], [

BMI

, |

Integer

|,

Body Mass Index

], [

Smoker

,

Binary

,, \

table

[[

Have you smoked at

Supervised Learning Algorithms ( SVM ) , YOU NEED

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!

Supervised Learning Algorithms ( SVM ) , YOU NEED TO WRITE A CODE PLEASE READ THE INSTRUCTIONS CAREFULLY, AND PLEASE DO NOT SUBMIT AN ANSWER IF IT'S INCOMPLETE . FIRST YOU NEED TO IMPORT THESE...

A STUDENT GUIDE TO INTRODUCTORY & FOUNDATION LEVEL CASE STUDY ASSIGNMENTS Using the 8 Step American Management Association (AMA) Problem Solving and Case Analysis Process Sally Armstrong, May 2005....

PLEASE SOLVE BOTH OF THESE PROBLEMS AND CODE THEM ACCORDING TO THE STATEMENTS AND THE TEMPLATE FILES. THANK YOU! for problem A this is the template file they are referring to. " # -*- coding: utf-8...

from pe1: while tries = 0: # add amount to account account += amount print(f'After Deposit new balance is {account : .2f}') else: print(f"Invalid entry. Please try again.") # end of transaction loop...

Please read it carefully. Don't answer question if you are going to give incomplete or wrong answer. I am just a student and trying to get help. Please understand. Please read this entire post before...

PLEASE READ CAREFULLY BEFORE YOU POST ANSWER. PLEASE FOLLOW THE INSTRUCTION. PLEASE DON'T CHANGE MY FORMAT, I ONLY NEED TO FIX THE ERROR. I DON'T WANT TO USE SOMEONE'S CODE FOR MY ASSIGNMENT. THIS IS...

Please answer me page 51 to page 56 on the attachment. is a multiple choice questions. Thank you FAC1502/101/3/2016 Tutorial letter 101/3/2016 Financial accounting concepts, principles and procedures...

INSTRUCTIONS ---> Python There are three parts to this project in Python. Please read all sections of the instructions carefully. I. Perceptron Learning Algorithm II. Linear Regression III....

I would like assistance with assignment 3 and 4 on the attached document I have been struggling with the subject and its my last AUI4863/102/0/2016 Tutorial letter 102/0/2016 ADVANCED INTERNAL AUDIT...

INSTRUCTIONS There are three parts to this project in Python. Please read all sections of the instructions carefully. I. Perceptron Learning Algorithm II. Linear Regression III. Classification You...

What is the relationship among scientific method, accounting research, and accounting policy making?

The process of making conclusion on the whole population based on studies performed on samples is known as____________.

The bottom - up approach is based on the costs of individual diseases. True or False