Question: When a user accesses a webpage in their web browser, the webpage typically contains multiple web components, such as images, JavaScript codes, Flash content, CSS

When a user accesses a webpage in their web browser, the webpage typically contains multiple web components, such as images, JavaScript codes, Flash content, CSS, etc. These components are often down loaded through additional HTTP(S) connections from either the first-party domain (the website the user is
visiting) or from third-party domains. This article will focus specifically on JavaScript codes, which are com monly used by ad networks, content distribution networks (CDNs), tracking services, analytics platforms, and online social networks, including Facebooks use of them for plugins.
The scenario of web tracking via JavaScript codes is depicted in Figure 1. When a user accesses a webpage from a first-party domain (steps 12), the web browser interprets the HTML tags and executes any JavaScript programs within the HTML script tags. These programs may trigger the browser to send additional requests
to retrieve content from third-party domains (step 3). Depending on their intended functionality, JavaScript programs can be considered either useful (functional), such as fetching content from a CDN, or used for tracking purposes. In the latter case, once the webpage has fully loaded (step 4), the JavaScript codes track the users activities on the webpage, read from or write to the cookie database (steps 56), and potentially reconstruct user identifiers. Tracking JavaScript programs may also be employed to "fingerprint" the users browser and system, and transfer sensitive information to third-party domains (step 7)
Suppose you are tasked with developing a machine learning model based on a single class (e.g., One-Class SVM (OCSVM) or Positive Unlabeled (PU) Learning) to distinguish between functional and tracking JavaScript codes. I have the database which contain multiple Javascript codes (TrackingJS and functionalJS)
Use Term Frequency- Inverse Document Frequency (TF-IDF) to extract features from functional and tracking JavaScript codes.
Develop either One-Class SVM or PU Learning, and a baseline SVM for comparison, to classify the JavaScript codes.
Design and conduct experiments to validate and test the efficacy of your developed model:
To report any over- or under-fitting of the models, you may use 60% of the data for testing, 20% for validation, and 20% for the testing.
Report and discuss the parameters of OCSVM or PU Learning model which give your improved results.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!