Question: This assignment is a step - by - step guide on how to detect domains that were generated using Domain Generation Algorithm ( DGA )

This assignment is a step

-

-

step guide on how to detect domains that

were generated using "Domain Generation Algorithm"

(

DGA

) .

Overview

2

main steps:

Feature Engineering

-

from raw domain strings to numeric Machine

Learning features using DataFrame manipulations

Machine Learning Classification

-

predict whether a domain is le

-

git or not using a Decision Tree Classifier

DGA

-

Background

"Various families of malware use domain generation algorithms

(

DGAs

)

to generate a large number of pseudo

-

random domain names to con

-

nect to a command and control

(

2)

server. In order to block DGA C

2

traffic, security organizations must first discover the algorithm by re

-

verse engineering malware samples, then generate a list of domains for

a given seed. The domains are then either preregistered, sink

-

holed or

published in a DNS blacklist. This process is not only tedious, but can

be readily circumvented by malware authors. An alternative approach

to stop malware from using DGAs is to intercept DNS queries on a net

-

work and predict whether domains are DGA generated. Much of the

previous work in DGA detection is based on finding groupings of like

domains and using their statistical properties to determine if they are

DGA generated. However, these techniques are run over large time

windows and cannot be used for real

-

time detection and prevention. In

addition, many of these techniques also use contextual information

such as passive DNS and aggregations of all NXDomains throughout a

network. Such requirements are not only costly to integrate, they may

not be possible due to real

-

world constraints of many systems

(

such as

endpoint detection

) .

An alternative to these systems is a much harder

problem: detect DGA generation on a per domain basis with no infor

-

mation except for the domain name. Previous work to solve this harder

problem exhibits poor performance and many of these systems rely

heavily on manual creation of features; a time consuming process that

can easily be circumvented by malware authors..."

[

Citation: Woodbridge et

.

2016

: "Predicting Domain Generation Algo

-

rithms with Long Short

-

Term Memory Networks"

]

For this exercise, you will use the attached dataset:

'dga

_

data

_

small.csv

" .

The goals are:

Develop features to be used in the model.

Develop a supervised model and evaluate its performance.

Deliverables:

1 .

A brief explanation of what algorithm you used and how it per

-

formed.

2 .

An explanation what features you extracted and how you arrived at

them.

3 .

The source code for your final model.

This assignment is a step - by - step guide on

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

10.255.255.255.ii (ii) Describe the advantages of the paired de Bruijn Graph versus the non paired version of the de Bruijn Graph. [3 marks] (d) Discuss the advantages of using soft k-means versus...

See page 129- 137 on attachment for more details there are five steps to the project. Step 1: Create the loan amortization schedule for the property. Step 2: Create the depreciation schedule. Step 3:...

Study Guide Healthcare Statistics By Jacqueline K. Wilson, RHIA About the Author Jacqueline K. Wilson is a Registered Health Information Administrator (RHIA) who has more than ten years of experience...

this is CircularLinkedList.java please help me to solve it, thank you so much ,and if you can please show your code through Intellj 1 The Somewhat Simplified Solitaire Encryp- tion Algorithm In Neal...

tudy of an innovative method based on complementarity between ARIZ, lean management and discrete event simulation for solving warehousing problems Fatima Zahra Ben Moussa a, , Roland De Guiob ,...

I need help in developing two or more solutions or interventions that align with my Ishikawa root cause thematic analysis factors. I need to trace back to the Ishikawa root cause analysis diagram. I...

URN 09/1026 DIGITAL BRITAIN Final Report JUNE 2009 DIGITAL BRITAIN - Final Report Published by TSO (The Stationery Office) and available from: Online www.tsoshop.co.uk Mail,Telephone, Fax & E-Mail...

PAPERS What Project Strategy Really Is: The Fundamental Building Block in Strategic Project ManagementPeerasit Patanakul, Stevens Institute of Technology, Hoboken, NJ, USA Aaron J. Shenhar, Rutgers...

Please help me make an Executive Summary. Explain what you will examine in the case study. Write an overview of the field you are researching. Make a thesis statement and sum up the results of your...

A uniformly loaded steel wide-flange beam with simple supports (see figure) has a downward deflection of 10 mm at the midpoint and angles of rotation equal to 0.01 radians at the ends. Calculate the...

4. (20 points) Difference Equations (a) (5 points) A bank account accrues with an interest rate r per year. It has no deposits or withdrawals after time n = 0. What is the difference equation for...

Describe your expectations from this course related to your aspirations from the point of view of being a scholar, practitioner, and leader. Explain how this course fits in your doctoral program. It...

While conducting a craniometric ancestry estimation using Fordisc, you notice that the smallest sample size for the groups under consideration is 51. Based on the guidelines discussed above, what is...