Question: Supervised Learning Algorithms ( SVM ) , YOU NEED TO WRITE A PYTHON CODE PLEASE READ THE INSTRUCTIONS CAREFULLY, AND PLEASE DO NOT SUBMIT AN
Supervised Learning Algorithms SVM YOU NEED TO WRITE A PYTHON CODE PLEASE READ THE INSTRUCTIONS CAREFULLY, AND PLEASE DO NOT SUBMIT AN ANSWER IF IT'S INCOMPLETE FIRST YOU NEED TO IMPORT THESE LIBRARIES
import pandas as pd
from sklearn.modelselection import traintestsplit from sklearn.metrics import accuracyscore, classification report From sklearn.metrics import rocaucscore from sklearn.preprocessing import StandardScaler from sklearn.neighbors import KNeighborsClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier from sklearn. linearmodel import LogisticRegression
THEN YOU NEED TO READ THE DATASET USING : df pdreadcsvDataset# csv
dfdrop duplicatesinplace True
YOU NEED TO TAKE A LOOK ON THE FIRST TABLE AND MAKE TWO COPIES WE NEED TWO TABLES ONE FOR EACH PHASE BEFORE OVERSAMPLING AND ONE AFTER OVERSAMPLING
FOR THE FIRST PHASE TAKE THIS CODE AND JUST ADD LOGISTIC REGRESSION TO IT
import pandas as pd
from sklearn.modelselection import traintestsplit
from sklearn.metrics import accuracyscore
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
df pdreadcsvTraincsv
mean dfXmean
dffillnaX:meaninplaceTrue
dropcols INCIDENTID'DATE'
dfdropcolumnsdropcols, inplaceTrue
Xtrain, Xtest, ytrain, ytest
traintestsplitdfdropMALICIOUSOFFENSE', axis'columns' dfMALICIOUSOFFENSE' trainsize randomstate
############################# KNN #########################################
model KNeighborsClassifiernneighbors
model.fitXtrain, ytrain
predtestknn model.predictXtest
predtrainknn model.predictXtrain
printKNearest Neighbors accuracy scoretest : accuracyscoreytest, predtestknn
printKNearest Neighbors accuracy scoretrain : accuracyscoreytrain, predtrainknn
print
######################### Decision Tree ################################
treeclf DecisionTreeClassifiermaxdepth
treeclffitXtrain, ytrain
predtesttree treeclfpredictXtest
predtraintree treeclfpredictXtrain
printDecision Tree accuracy scoretest : accuracyscoreytest, predtesttree
printDecision Tree accuracy scoretrain : accuracyscoreytrain, predtraintree
print
######################### Random Forest ################################
rndclf RandomForestClassifiernestimators maxleafnodes njobs
rndclffitXtrain, ytrain
predtestrf rndclfpredictXtest
predtrainrf rndclfpredictXtrain
printRandom Forest accuracy scoretest : accuracyscoreytest, predtestrf
printRandom Forest accuracy scoretrain : accuracyscoreytrain, predtrainrf
NOTE TAKE THE NUMBERS OF TRAINING AND TESTING FROM THE FIRST SCREENSHOT
AND WE NEED TO DO STANDARD SCALING ON X TRAIN AND X TEST X TRAIN USE FIT.TRANSFORM, X TEST USE TRANSFORM ONLY
AND THEN WE NEED TO PRINT THE RESULTS JUST PRINT THE ACCURACY OF THE TRAINING AND TESTING AND PRINT THE CLASSIFICATION REPORT THE CLASSIFICATION REPORT CONTAINS PrecisionRecall,FScore FOR EACH CLASS IN THE LABEL this is before oversampling phase
For after oversampling
U can use this code
from imblearn.oversampling import SMOTE
import pandas as pd
traindf pdreadcsvtraincsv
trainY traindfattackcategory'
trainx traindfdropattackcategory','attacktype', 'protocoltype','service','flag' axis
printtrainYvaluecounts
sm SMOTEsamplingstrategy'auto', randomstate
trainxsm trainYsm smfitresampletrainx trainY
printtrainYsmvaluecounts
You just need to do scaling and print the results
For the AUC in the tables we use this print statement to print it
ypredDT predictprobaxtest
print rocaucscoreytrain, ypred, multiclass'ovr'
Here is the statement for the classification report use it : classificationreport ytest, predtesttree You should submit the following:
Python code file
Documentation with the description of the steps you followed and an explanation of the results you got.
The dataset description
The Dataset.csv file is a clean dataset of survey responses to the CDCs BRFSS The target variable Diabetes has classes. is for no diabetes or only during pregnancy, is for prediabetes, and is for diabetes.
This dataset has features and records. The following is the description of the features:
tableVariable Name,Type,DescriptionDiabetesbinary, Binary, no diabetes prediabetes or diabetesHighBPBinary, no high BP high BPHighCholBinary, no high cholesterol high cholesterolCholCheck Binary table no cholesterol check in years yes cholesterol check in yearsBMIIntegerBody Mass IndexSmoker Binary tableHave you smoked at
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
