Question: Overview Machine learning methods use effectively to detect malicious websites. In this assignment, you are required to classify malicious websites by using provided dataset (malicious_and_benign_websites1.csv).


Overview Machine learning methods use effectively to detect malicious websites. In this assignment, you are required to classify malicious websites by using provided dataset (malicious_and_benign_websites1.csv). The features have been extracted and clearly structured in CSV format, as summarized in Table 1. Table 1: features description of malicious and benign websites dataset COLUMN NAME description the anonymous identification of the URL analyzed in the study URL URL_LENGTH the number of characters in the URL NUMBER_SPECIAL_CHARACTERS the number of special characters identified in the URL, such as / %, .& - 1 - CHARSET SERVER the character encoding standard (also known as the character set) the operating system of the server obtained from the packet response. the content size of the HTTP header the country of the server the state of the country of the server (if known) CONTENT_LENGTH WHOIS_COUNTRY WHOIS_STATEPRO WHOIS_REGDATE the server date and time WHOIS_UPDATED_DATE the last update of the server TCP_CONVERSATION_EXCHANGE the number of TCP packets exchanged between the server and our honeypot client DIST_REMOTE_TCP_PORT the number of the ports detected and different to TCP REMOTE_IPS the total number of IPs connected to the honeypot APP_BYTES the number of bytes transferred SOURCE_APP_PACKETS packets sent from the honeypot to the server REMOTE_APP_PACKETS packets received from the server the total number of IP packets generated during the communication APP_PACKETS between the honeypot and the server PNS_QUERY_TIMES the number of DNS packets generated during the communication between the honeypot and the server is for malicious websites and is for benign websites TYPE Problem Statement This is an individual assessment task. Each student is required to submit a report of approximately 1000 words along with exhibits to support findings with respect to the provided malicious and benign websites. This report should consist of: Literature review in malicious websites detection Construction of datasets, data pre-processing and features Workflow of malicious website detection that describes the process of conducting malicious website detection Technical findings of classification results Justified discussion of the performance evaluation outcomes for different classifiers Overview Machine learning methods use effectively to detect malicious websites. In this assignment, you are required to classify malicious websites by using provided dataset (malicious_and_benign_websites1.csv). The features have been extracted and clearly structured in CSV format, as summarized in Table 1. Table 1: features description of malicious and benign websites dataset COLUMN NAME description the anonymous identification of the URL analyzed in the study URL URL_LENGTH the number of characters in the URL NUMBER_SPECIAL_CHARACTERS the number of special characters identified in the URL, such as / %, .& - 1 - CHARSET SERVER the character encoding standard (also known as the character set) the operating system of the server obtained from the packet response. the content size of the HTTP header the country of the server the state of the country of the server (if known) CONTENT_LENGTH WHOIS_COUNTRY WHOIS_STATEPRO WHOIS_REGDATE the server date and time WHOIS_UPDATED_DATE the last update of the server TCP_CONVERSATION_EXCHANGE the number of TCP packets exchanged between the server and our honeypot client DIST_REMOTE_TCP_PORT the number of the ports detected and different to TCP REMOTE_IPS the total number of IPs connected to the honeypot APP_BYTES the number of bytes transferred SOURCE_APP_PACKETS packets sent from the honeypot to the server REMOTE_APP_PACKETS packets received from the server the total number of IP packets generated during the communication APP_PACKETS between the honeypot and the server PNS_QUERY_TIMES the number of DNS packets generated during the communication between the honeypot and the server is for malicious websites and is for benign websites TYPE Problem Statement This is an individual assessment task. Each student is required to submit a report of approximately 1000 words along with exhibits to support findings with respect to the provided malicious and benign websites. This report should consist of: Literature review in malicious websites detection Construction of datasets, data pre-processing and features Workflow of malicious website detection that describes the process of conducting malicious website detection Technical findings of classification results Justified discussion of the performance evaluation outcomes for different classifiers
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
