Question: In this assignment, you will write a Python-based socket program to implement a simple Web Proxy. The following diagram can help you understand how a
In this assignment, you will write a Python-based socket program to implement a simple Web Proxy. The following diagram can help you understand how a Web Proxy works. (1) The client sends an HTTP request to retrieve an object (e.g. HTML); (2) This request gets received by the proxy and it creates a fresh HTTP request for the same object to the origin server; (3) Send the new HTTP request to the Web Server; (4) Webserver sends the response back to the Proxy Server; (5) The proxy server creates a new HTTP response along with the object and sends back to the client.
Task Details: 1. Forward HTTP requests and responses (without caching) In this task, you need to implement a TCP Server (Proxy) on the localhost:8888, which will accept HTTP GET requests from browsers. When the client browser goes to a particular website (e.g. yahoo.com), it will make the requests directly to the proxy server. Your proxy server needs to create a new client socket that connects to the destination web server and forward the HTTP request there. - Also, print out the HTTP request and take a close look: does it need to be modified before being forwarded to the destination? Once the proxy server sends the HTTP request to destination web server, it will receive an HTTP response, which will be forwarded to the browser client so that the requested web page can be displayed in the browser. - Things to observe and think: How many HTTP requests are issued in order to retrieve one web page? How do you say that you have received the complete response from the destination?
2. Handling multiple HTTP requests by the TCP Proxy Server Here, you need to make your proxy server become capable of handling multiple incoming connections at the same time. The way to achieve this is to use the select.select() method. Below are links to a tutorial and the Python documentation. You'll figure it out! How to Work with TCP Sockets in Python (with Select Example) select Waiting for I/O completion ** Make sure to maintain the select list properly by removing sockets from the list whenever they become inactive. 3. Enable Caching In this task, for each requested URL, your proxy server will save the response from the destination in a file on the disk (so the cache persists when we terminate and restart the proxy server). Next time the same URL gets requests, the server will load the response from the corresponding file on the disk rather than creating a connection to the destination server. You'll see that this greatly improves the page loading speed on the browser side. Some items to think: How do you name the cache files, i.e., how do convert the URLs into proper filenames? Some things to not worry about: Assume that we have unlimited disk space, so we are not worried about which pages to remove or evict.
4. Add expiry date/time to cache Here you need to add a parameter that specifies how long a cached item stays valid. This parameter is passed to the program as a command argument, e.g. python proxyServer.py 120, which means the cached item expires in 120 seconds (2 minutes) after it's created. To implement this, you'll need to check the last-modified time of a file (using os.path.getmtime()) and compare it with the current time (time.time()). If the item expires, you need to fetch it from the destination server again and update the cache accordingly Important Notes: 1. You should handle all the possible errors and exceptions. 2. Your code must be written in Python 3. 3. You are only allowed to have the following import statement in your code: import sys, os, time, socket, select No other library modules are allowed.
4. Your proxy server must be started by a command like the following: python proxyServer.py 120 where 120 is the maximum age (in seconds) for an item in the cache. No other action (e.g., creating a folder with a certain name) should be required to start the program. 5. The URL entered in the browser to visit a web page via the proxy server must be like the following: http://localhost:8888/the.web.page/to/visit/ i.e., the host name must be localhost and the port number must be 8888 6. Your proxy server only needs to be able to handle GET requests. 7. Your proxy server does not need to be able to handle https connections. 8. A Starter Code is provided for you, which you must modify to complete for implementing the proxy server.
What to be submitted: 1. A PDF formatted report with all evidence, codes, execution samples, multiple test scenarios, instructions for running the submitted programs, and references used. (10% grade) 2. Readability of the program is highly valued. Program comments has a weight of 10% of total points.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
