Question: Supervised Learning Algorithms ( SVM ) , YOU NEED TO WRITE A PYTHON CODE PLEASE READ THE INSTRUCTIONS CAREFULLY, AND PLEASE DO NOT SUBMIT AN

Supervised Learning Algorithms (SVM), YOU NEED TO WRITE A PYTHON CODE PLEASE READ THE INSTRUCTIONS CAREFULLY, AND PLEASE DO NOT SUBMIT AN ANSWER IF IT'S INCOMPLETE .FIRST YOU NEED TO IMPORT THESE LIBRARIES
import pandas as pd
from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score, classification report From sklearn.metrics import roc_auc_score from sklearn.preprocessing import StandardScaler from sklearn.neighbors import KNeighborsClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier from sklearn. linear_model import LogisticRegression
THEN YOU NEED TO READ THE DATASET USING : df= pd.read_csv("Dataset#4. csv")
df.drop_ duplicates(inplace =True)
YOU NEED TO TAKE A LOOK ON THE FIRST TABLE AND MAKE TWO COPIES (WE NEED TWO TABLES ONE FOR EACH PHASE , BEFORE OVERSAMPLING AND ONE AFTER OVERSAMPLING)
FOR THE FIRST PHASE TAKE THIS CODE AND JUST ADD LOGISTIC REGRESSION TO IT
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
df = pd.read_csv('Train1.csv')
mean= df["X_12"].mean()
df.fillna({"X_12":mean},inplace=True)
drop_cols =['INCIDENT_ID','DATE']
df.drop(columns=drop_cols, inplace=True)
X_train, X_test, y_train, y_test =\
train_test_split(df.drop('MALICIOUS_OFFENSE', axis='columns'), df['MALICIOUS_OFFENSE'], train_size=0.80, random_state=2)
############################# KNN #########################################
model = KNeighborsClassifier(n_neighbors=1)
model.fit(X_train, y_train)
pred_test_knn= model.predict(X_test)
pred_train_knn= model.predict(X_train)
print("K-Nearest Neighbors accuracy score(test) : ",accuracy_score(y_test, pred_test_knn))
print("K-Nearest Neighbors accuracy score(train) : ",accuracy_score(y_train, pred_train_knn))
print()
######################### Decision Tree ################################
tree_clf = DecisionTreeClassifier(max_depth=16)
tree_clf.fit(X_train, y_train)
pred_test_tree= tree_clf.predict(X_test)
pred_train_tree= tree_clf.predict(X_train)
print("Decision Tree accuracy score(test) : ",accuracy_score(y_test, pred_test_tree))
print("Decision Tree accuracy score(train) : ",accuracy_score(y_train, pred_train_tree))
print()
######################### Random Forest ################################
rnd_clf= RandomForestClassifier(n_estimators=120, max_leaf_nodes=200, n_jobs=-1)
rnd_clf.fit(X_train, y_train)
pred_test_rf = rnd_clf.predict(X_test)
pred_train_rf = rnd_clf.predict(X_train)
print("Random Forest accuracy score(test) : ",accuracy_score(y_test, pred_test_rf))
print("Random Forest accuracy score(train) : ",accuracy_score(y_train, pred_train_rf))
NOTE TAKE THE NUMBERS OF TRAINING AND TESTING FROM THE FIRST SCREENSHOT
AND WE NEED TO DO STANDARD SCALING ON X TRAIN AND X TEST (X TRAIN USE FIT.TRANSFORM, X TEST USE TRANSFORM ONLY)
AND THEN WE NEED TO PRINT THE RESULTS 1. JUST PRINT THE ACCURACY OF THE TRAINING AND TESTING AND PRINT THE CLASSIFICATION REPORT THE CLASSIFICATION REPORT CONTAINS (Precision,Recall,F-Score) FOR EACH CLASS IN THE LABEL ) this is before oversampling (phase1)
***For after oversampling
U can use this code
from imblearn.over_sampling import SMOTE
import pandas as pd
train_df = pd.read_csv("train.csv")
train_Y = train_df['attack_category']
train_x = train_df.drop(['attack_category','attack_type', 'protocol_type','service','flag'], axis=1)
print(train_Y.value_counts())
sm = SMOTE(sampling_strategy='auto', random_state=0)
train_x_sm, train_Y_sm = sm.fit_resample(train_x, train_Y)
print(train_Y_sm.value_counts())
You just need to do scaling and print the results
For the AUC in the tables we use this print statement to print it
y_pred-DT. predict_proba(x_test)
print (roc_auc_score(y_train, y_pred, multi_class-'ovr'))
Here is the statement for the classification report use it : classification_report y_test, pred_test_tree You should submit the following:
1- Python code file
2- Documentation with the description of the steps you followed and an explanation of the results you got.
The dataset description
The Dataset.csv file is a clean dataset of 253,680 survey responses to the CDC's BRFSS2015. The target variable Diabetes_012 has 3 classes. 0 is for no diabetes or only during pregnancy, 1 is for prediabetes, and 2 is for diabetes.
This dataset has 21 features and 253680 records. The following is the description of the features:
\table[[Variable Name,Type,Description],[Diabetes_binary, Binary,0= no diabetes 1= prediabetes or diabetes],[HighBP,Binary,0= no high BP1= high BP],[HighChol,Binary,0= no high cholesterol 1= high cholesterol],[CholCheck, Binary ,,\table[[0= no cholesterol check in 5 years 1= yes cholesterol check in 5],[years]]],[BMI,|Integer|,Body Mass Index],[Smoker, Binary ,,\table[[Have you smoked at
Supervised Learning Algorithms ( SVM ) , YOU NEED

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!

Q: