Question: Task You are to import and clean the same HealthCareData _ 2 0 2 4 . csv , that was used in the previous assignment.

Task

You are to import and clean the same HealthCareData

_2024 .

csv

,

that was used in the

previous assignment. Then run, tune and evaluate two supervised ML algorithms

(

each

with two types of training data

)

to identify the most accurate way of classifying

malicious events.

Part

1

General data preparation and cleaning

)

Import the HealthCareData

_2024 .

csv into R Studio. This version is the same as

Assignment

1 .

)

Write the appropriate code in R Studio to prepare and clean the

HealthCareData

_2024

dataset as follows:

.

Clean the whole dataset based on the feedback received for Assignment

1 .

.

For the feature NetworkInteractionType, merge the

Regular

and

Unknown

categories together to form the category

Others

.

Hint: use the

forcats:: fct

_

collapse

(.)

function.

iii. Select only the complete cases using the na

.

omit

(.)

function, and name the

dataset dat.cleaned.

Briefly outline the preparation and cleaning process in your report and why you

believe the above steps were necessary.

)

Use the code below to generated two training datasets

(

one unbalanced

mydata.ub

.

train and one balanced mydata.b

.

train

)

along with the testing set

(

mydata

.

test

) .

Make sure you enter your student ID into the command

set.seed

(.) .

# Separate samples of normal and malicious events

dat.class

0 < -

dat.cleaned

% > %

filter

(

Classification

= =

"Normal"

)

# normal

dat.class

1 < -

dat.cleaned

% > %

filter

(

Classification

= =

"Malicious"

)

# malicious

# Randomly select

9600

non

-

malicious and

400

malicious samples using your student

,

then combine them to form a working data set

set.seed

(

Enter your Student ID

)

rows.train

0 < -

sample

(1

:nrow

(

dat

.

class

0),

size

= 9600,

replace

=

FALSE

)

rows.train

1 < -

sample

(1

:nrow

(

dat

.

class

1),

size

= 400,

replace

=

FALSE

)

# Your

10000

unbalanced

training samples

train.class

0 < -

dat.class

0 [

rows

.

train

0,]

# Non

-

malicious samples

train.class

1 < -

dat.class

1 [

rows

.

train

1,]

# Malicious samples

mydata.ub

.

train

< -

rbind

(

train

.

class

0,

train.class

1)

# Your

19200

balanced

training samples, i

.

. 9600

normal and malicious samples e

ach.

set.seed

(

Enter your Student ID

)

6 |

P a g e

train.class

1_2 < -

train.class

1 [

sample

(1

:nrow

(

train

.

class

1),

size

= 9600,

replace

=

TRUE

),]

mydata.b

.

train

< -

rbind

(

train

.

class

0,

train.class

1_2)

# Your testing samples

test.class

0 < -

dat.class

0 [-

rows.train

0,]

test.class

1 < -

dat.class

1 [-

rows.train

1,]

mydata.test

< -

rbind

(

test

.

class

0,

test.class

1)

Note that in the master data set, the percentage of malicious events is

approximately

4 % .

This distribution is roughly represented by the unbalanced

data. The balanced data is generated based on up

-

sampling of the minority class

using bootstrapping. The idea here is to ensure the trained model is not biased

towards the majority class, i

.

.

normal events.

Part

2

Compare the performances of different ML algorithms

)

Randomly select two supervised learning modelling algorithms to test against

one another by running the following code. Make sure you enter your student ID

into the command set.seed

(.) .

Your

2

ML approaches are given by myModels.

set.seed

(

Enter your student ID

)

models.list

1 < -

("

Logistic Ridge Regression",

"Logistic LASSO Regression",

"Logistic Elastic

-

Net Regression"

)

models.list

2 < -

("

Classification Tree",

"Bagging Tree",

"Random Forest"

)

myModels

< -

(

sample

(

models

.

list

1,

size

= 1),

sample

(

models

.

list

2,

size

= 1))

myModels

% > %

data.frame

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

On November 1, the firm of Bowes, Simmons, and Ahmad decided to liquidate its partnership. The partners have capital balances of $69,000, $85,000, and $12,000, respectively. The cash balance is...

Task You are to import and clean the same HealthCareData _ 2 0 2 4 . csv , that was used in the previous assignment. Then run, tune and evaluate two supervised ML algorithms ( each with two types of...

Code the function greedy_predicator without using numpy/pandas Please include explanation of the code & the computational complexity To see the description of the function: Scroll down the...

Python 3 2048 Game This is the unfilled code that must be answered : import random as rnd import os import sys class Grid(): def __init__(self, row=4, col=4, initial=2): self.row = row # number of...

Use Python 3 to build a 2048 Game: Task 1 Your first task will be to implement the function createGrid() that takes two parameters row and col and returns the grid for the game. You will be using a...

Jupiter Notebook We have covered some of the limitations of single layer neural networks in class, but they are still powerful learning systems that provide a good way to begin learning about how to...

Mates Rates Rent-A-Car ( just do the part a) using visual studio code (C#) Criteria sheet - Par A Example supplementary files (readme.pdf) Example supplementary files (class-diagram.pdf) Assignment...

I need help with this side project. Any help would be appreciated. Working in Java. Attached at the bottom are two template files. I need the two class templates filled in according to the...

please help I need this by tonight ASAP. if you solve this i will give you the BEST RATING. ALL I NEED IS THE FULL DETAILED CODE FOR THE MULTITASKING COMMANDER. other commanders are finished. I...

ALSO, PLEASE USE THE GIVEN INFORMATION for AGES: https://ufile.io/ryviq the program should be I'm Python 3. Purpose: To practice Arithmetic with arrays. Degree of Difficulty: Easy to Moderate In this...

Waiting Lines Patient Customers An ATM is installed in a train station. Data from similar ATMssuggest one person will arrive to use the ATM every 2.75 minuteswith a standard deviation of 3.75...

Naive numerology suggests that there should be 23 = 8 possible combinations of sign for GE, HE, and SE. Table 16.6 shows only six. Why? In Table 16.6 26A. Z. Panagiotopoulos, Molecular Physics, vol....

Which of the following is consistent with the socioemotional role? Follow Encourage None of these choices All of these choices Compromise

The book balance in the checking account of Kyri Enterprises as of November 30 is $2,964.00. The bank statement shows an ending balance of $2,525.00. The following information is discovered by (1)...

In the Low Knock Oil Company blending problem, it was assumed that one barrel of crude would result in one barrel of gasoline as the final product. In processing one barrel of crude, a typical...

Cindy, Casey, and Kara each invested $30,000 in a real estate venture. The partnership borrowed $200,000 and purchased a warehouse for $290,000. The note was secured by the building; there was no...

Regal Marine, one of the U.S.s 10 largest power-boat manufacturers, achieves its missionproviding luxury performance boats to customers worldwideusing the strategy of differentiation. It...

Pedro sells investment land on September 1, 2015. Information pertaining to the sale follows: Adjusted basis......................................$25,000 Selling...

The term "common law," was first used in England in 1 0 6 6 . True False

Why should an individual manager be interested in supporting HR?

Why might a department manager have to remind HR about an aspect of the performance appraisal process?

Why do many organizations require that significant disciplinary actions be cleared with HR before they are implemented?