Question: Q 4 . Support Vector Machines ( SVM ) is a supervised learning algorithm, which can be applicable to both classification and regression. The data

Q4. Support Vector Machines (SVM) is a supervised learning algorithm, which can be applicable to both classification and regression. The data set provided contains normal and fraudulent transactions in excel file Week6-Fraud_data in sheets FraudTrain and FraudTest. Using Support Machine Model (SVM), predict whether the transactions are Normal or Fraudulent based on the features of the transactions in the dataset.
Use is_fraud as target variable and features as independent variables. Convert all textual categorical variables to numeric, and clean data if necessary. For SVM model, add Vaimal Machine Learning Add-in.
(Note: Install Vaimal Add-in attached with the assignment using installation instructions below and detailed instruction in the Manual.)
Please follow below process for model development:
1. Import or load Data: Place data in an Excel worksheet.
2. Perform Data Preprocessing: Deal with missing data, data normalization, and encoding categorical inputs.
(Vaimal has several utilities for preprocessing data such as Data Manager.)
3. Select a Model: To use and design it. Select which model to use and the design parameters.
4. Train the Model: Using training data with known outputs, train the model.
5. Test the Model Using different data than the training data, test the models ability to predict versus known outputs.
6. Prediction: Use the model to make predictions of data with unknown output.
Input Variables:
trans_date_trans_time, cc_num, first, last, merchant, category, amt, gender, street, city, state, zip, lat, long, city_pop, job, unix_time, merch_lat, and merch_long. Please convert categorical variables to Numbers such as Gender, and category etc. You can drop unwanted columns and use below columns as INPUT columns:
Output Variables:
Please use column is_fraud as a target variable.
Data Flag for Training or Testing:
The column Data_FLAG differentiates the data as Training and Testing. Please supply input to the model as per inputs.
a. Provide descriptive measures of column amt: count, Min, Max, Mean, and Std. deviation. Write the count of fraudulent and normal transactions.
b. Show graph of Normalized frequency for columns category and Is_fraud to show frequency for normal and fraudulent transactions.
c. Perform Error analysis using confusion matrix, which is created with four categories - true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN).
Precision (P)= TP/(TP + FP)
Recall (R)= TP/(TP + FN)
Accuracy =(TP + TN)/(TP + FP + TN + FN)
F1 score =(2* P * R\/(P + R)
d. Based on the features below, predict the status of the transaction if it is Fraud:
trans_date_trans_time: 12/2/202022:27
cc_num: 3.588E+15
merchant: fraud_Torphy-Goyette
category: shopping_pos
amt: 1318.89
first: Jason, last: Johnson, gender: M
street: 5942 Thomas Park, city: Craig
state: AK, zip: 99921
lat: 55.4732, long: -133.1171, city_pop: 1920
job: Commissioning editor, dob: 6/17/1997
trans_num: 2682f81f3f9e070b7abc721ca4bd5862
unix_time: 1386023256
merch_lat: 54.801713, merch_long: -133.669108
is_fraud: 1
 Q4. Support Vector Machines (SVM) is a supervised learning algorithm, which

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!