Question: In this project, you will implement a web proxy that passes requests and data between multiple web clients and web servers. This assignment will give

In this project, you will implement a web proxy that passes requests and data between multiple web clients and web servers. This assignment will give you a chance to get to know one of the most popular application protocols on the Internet -- the Hypertext Transfer Protocol (HTTP).

HTTP Proxy

Getting Started

download the starter package (included at the end of this post).

You will find the following starter code files: http_proxy.py test_scripts README.pdf

Task Specification

Your task is to build a web proxy capable of accepting HTTP requests, forwarding requests to remote (origin) servers, and returning response data to a client.

The proxy will be implemented in your preferred language, and if you're not using python, an Makefile to generate executable and a brief instruction has to be provided.

Your proxy program should run without errors. It should take as its first argument a port to listen from. Don't use a hard-coded port number. If you use any other programming language, you need to

supply a Makefile for compilation, and it should produce an executable file called http_proxy.

You shouldn't assume that your proxy will be running on a particular IP address, or that clients will be coming from a pre-determined IP address.

Dummy Proxy

Your proxy should listen on the port specified from the command line and wait for incoming client connections. Once a client has connected, the proxy should read data from the client and then check for a properly-formatted HTTP request.

The client issues a request by sending a line of text to the proxy server. This request line consists of an HTTP method (most often "GET", but "POST", "PUT", and others are possible), a request URI (like a URL), and the protocol version that the client wants to use ("HTTP/1.1"). The request line is followed by one or more header lines. The message body of the initial request is typically empty. URI

in the HTTP request to a proxy server is required to be absolute URL, so you will obtain both host name and path to the file in request line.

Your dummy proxy is only responsible for accepting the HTTP request. All requests should elicit a well-formed HTTP response with status code 501 "Not Implemented". In the body of this HTTP response, you should supply a dummpy HTML page which prints the URL requested by client and a message "You request will be forwarded."(However, it won't...unless you finish the next part).

Note: You need to parse the HTTP message, don't dump the entire message directly on the returned page.

Complete Proxy Servers

Before you start this part, make a copy of the code for the dummy proxy and named it as http_proxy_dummy.py. Then continue to work on the file http_proxy.py.

Sending Requests to Servers

Once the proxy has parsed the URL in the client request, it can make a connection to the requested host (using the appropriate remote port, or the default of 80 if none is specified) and send the HTTP request for the appropriate resource. The proxy should always send the request in the relative URL +

Host header format regardless of how the request was received from the client.

For example, if the proxy accepts the following request from a client:

GET http://www.example.com/ HTTP/1.1

It should send the following request to the remote server:

GET / HTTP/1.1

Host: www.example.com

Connection: close

(Additional client specified headers, if any...)

Note that we always send HTTP/1.1 flags and a Connection: close header to the server, so that it will close the connection after its response is fully transmitted, as opposed to keeping open a persistent connection. So while you should pass the client headers you receive on to the server, you should make sure you replace any Connection header received from the client with one specifying close, as shown.

Returning Response to Clients

After the response from the remote server is received, the proxy should send the response message as-is to the client via the appropriate socket.

For any error caught by the proxy, the proxy should return the status 500 'Internal Error'. As stated above, any request method other than GET should cause your proxy to return status 500 'Internal Error' rather than 501 'Not Implemented'. Likewise, for any invalid, incorrectly formed headers or

requests, your proxy should return status 500 'Internal Error' rather than 400 'Bad Request' to the client.

Otherwise, your proxy should simply forward status replies from the remote server to the client. This means most 1xx, 2xx, 3xx, 4xx, and 5xx status replies should go directly from the remote server to the client through your proxy. (While you are debugging, make sure that 404 status replies from the remote server are not the result of poorly forwarded requests from your proxy.)

Concurrent Requests (Optional)

A practical web proxy should be able to support multiple clients at the same time. You may choose appropriate library to add multi-thread support into your program.

Testing Your Proxy

Run your proxy with the following command:

python http_proxy.py &, where port is the port number that the proxy should listen on. As a basic test of functionality, try requesting a page using telnet:

telnet localhost

Trying 127.0.0.1...

Connected to localhost.localdomain (127.0.0.1).

Escape character is '^]'.

GET http://www.example.com/ HTTP/1.1

If your proxy is working correctly, the headers and HTML of example.com should be displayed on your terminal screen. Notice here that we request the absolute URL (http://www.example.com/) instead of just the relative URL (/). A good sanity check of proxy behavior would be to compare the HTTP response (headers and body) obtained via your proxy with the response from a direct telnet

connection to the remote server. Additionally, try requesting a page using telnet concurrently from two different shells.

Then try testing your proxy with the supplied test_proxy.py script. This will compare the result of fetching 4 pre-determined websites directly versus through your proxy:

python testing_scripts/test_proxy.py http_proxy.py [port (optional, will be random if omitted)]

(This script requires http_proxy.py to be an executable file. chmod +x http_proxy.py can be used on Linux/Mac OS X. On windows, it should be setup by python installer already.)

Things to submit make a zip file including all the files in the starter package.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

proxy.py test_scripts

#!/usr/bin/env python

import os

import random

import sys

import signal

import socket

import telnetlib

import time

import threading

import urlparse

try:

import proxy_grade_private

use_private = True

except ImportError:

use_private = False

# pub_urls - The list of URLs to compare between the proxy

# and a direct connection.

#

# You can create additional automated tests for your proxy by

# adding URLs to this array. This will have no effect on your

# grade, but may be helpful in testing and debugging your proxy.

#

# When you are testing against real web servers on the Internet,

# you may see minor differences between the proxy-fetched page and

# the regular page- possibly due to load balancing or dynamically

# generated content. If there is only a single line that doesn't

# match between the two, it is likely a product of this sort of

# variation.

#

# Note that since this test script compares transaction output from

# the proxy and the direct connection, using invalid URLs may

# produce unexpected results, including the abnormal termination

# of the testing script.

#

pub_urls = ['http://example.com/',

'http://johnjay.jjay.cuny.edu/',

'http://chujie.github.io/csci379/',

'https://chujie.github.io/csci379/test.html',

]

# timeout_secs - Individual tests will be killed if they do not

# complete within this span of time.

timeout_secs = 30.0

def main():

global pub_urls

try:

proxy_bin = sys.argv[1]

except IndexError:

usage()

sys.exit(2)

try:

port = sys.argv[2]

except IndexError:

port = str(random.randint(1025, 49151))

print 'Binary: %s' % proxy_bin

print 'Running on port %s ' % port

# Start the proxy running in the background

cid = os.spawnl(os.P_NOWAIT, proxy_bin, proxy_bin, port)

# Give the proxy time to start up and start listening on the port

time.sleep(2)

passcount = 0

for url in pub_urls:

print '### Testing: ' + url

passed = run_test(compare_url, (url, port), cid)

if not live_process(cid):

print '!!!Proxy process experienced abnormal termination during test- restarting proxy!'

(cid, port) = restart_proxy(proxy_bin, port)

passed = False

if passed:

print '%s: [PASSED] ' % url

passcount += 1

else:

print '%s: [FAILED] ' % url

if (use_private):

(priv_passed, test_count, cid) = proxy_grade_private.runall(port, cid, proxy_bin)

# Cleanup

terminate(cid)

print 'Summary: '

print '\t%d of %d tests passed.' % (passcount, len(pub_urls))

if (use_private):

print '%d of %d extended tests passed' % (priv_passed, test_count)

def usage():

print "Usage: proxy_grader.py path/to/proxy/binary port"

print "Omit the port argument for a randomly generated port."

def run_test(test, args, childid):

'''

Run a single test function, monitoring its execution with a timer thread.

* test: A function to execute. Should take a tuple as its sole

argument and return True for a passed test, and False otherwise.

* args: Tuple that contains arguments to the test function

* childid: Process ID of the running proxy

The amount of time that the monitor waits before killing

the proxy process can be set by changing timeout_secs at the top of this

file.

Returns True for a passed test, False otherwise.

'''

monitor = threading.Timer(timeout_secs, do_timeout, [childid])

monitor.start()

if not test(args):

passed = False

else:

passed = True

monitor.cancel()

return passed

def compare_url(argtuple):

'''

Compare proxy output to the output from a direct server transaction.

A simple sample test: download a web page via the proxy, and then fetch the

same page directly from the server. Compare the two pages for any

differences, ignoring the Date header field if it is set.

Argument tuples is in the form (url, port), where url is the URL to open, and

port is the port the proxy is running on.

'''

(url, port) = argtuple

urldata = urlparse.urlparse(url)

try:

(host, hostport) = urldata[1].split(':')

except ValueError:

host = urldata[1]

hostport = 80

# Retrieve via proxy

try:

proxy_data = get_data('localhost', port, url)

except socket.error:

print '!!!! Socket error while attempting to talk to proxy!'

return False

# Retrieve directly

direct_data = get_direct(host, hostport, urldata[2])

# Compare responses

return compare_responses(proxy_data, direct_data)

def compare_responses(proxy_data, direct_data, lenient_header=True):

proxy_header = proxy_data.split(" ")[0]

direct_header = direct_data.split(" ")[0]

proxy_response_line = proxy_data.split(" ")[0]

direct_response_line = direct_data.split(" ")[0]

if "200" in proxy_response_line:

proxy_body = proxy_data.split(" ")[1].split("")[-1]

else:

proxy_body = ""

if "200" in direct_response_line:

direct_body = direct_data.split(" ")[1].split("")[-1]

else:

direct_body = ""

if proxy_response_line != direct_response_line:

print "Response lines don't match: Direct: {} Proxy: {} ".format(direct_response_line, proxy_response_line)

return False

if not lenient_header:

for proxy_h in proxy_header:

if not proxy_h.startswith("Date") and not proxy_h.startswith("Expires") and not (proxy_h in direct_header):

print "Headers don't match: Direct: {} Proxy: {} ".format(direct_header, proxy_header)

return False

for direct_h in direct_header:

if not direct_h.startswith("Date") and not direct_h.startswith("Expires") and not (direct_h in proxy_header):

print "Headers don't match: Direct: {} Proxy: {} ".format(direct_header, proxy_header)

return False

if proxy_body != direct_body:

print "HTML content doesn't match: Direct: {} Proxy: {} ".format(direct_body, proxy_body)

return False

return True

def get_direct(host, port, url):

'''Retrieve a URL using direct HTTP/1.1 GET.'''

getstring = 'GET %s HTTP/1.1 Host: %s Connection: close '

data = http_exchange(host, port, getstring % (url, host))

return data

def get_data(host, port, url):

'''Retrieve a URL using proxy HTTP/1.1 GET.'''

getstring = 'GET %s HTTP/1.1 Connection: close '

data = http_exchange(host, port, getstring % url)

#return data.split(' ')

return data

def http_exchange(host, port, data):

conn = telnetlib.Telnet()

conn.open(host, port)

conn.write(data)

ret_data = conn.read_all()

conn.close()

return ret_data

def live_process(pid):

'''Check that a process is still running.'''

try:

os.kill(pid, 0)

return True

except OSError:

return False

def do_timeout(id):

'''Callback function run by the monitor threads to kill a long-running operation.'''

print '!!!! Proxy transaction timed out after %d seconds' % timeout_secs

terminate(id)

def terminate(id):

'''Stops and cleans up a running child process.'''

assert(live_process(id))

os.kill(id, signal.SIGINT)

os.kill(id, signal.SIGKILL)

try:

os.waitpid(id, 0)

except OSError:

pass

def restart_proxy(binary, oldport):

'''Restart the proxy on a new port number.'''

newport = str(int(oldport) + 1)

cid = os.spawnl(os.P_NOWAIT, binary, binary, newport)

time.sleep(3)

return (cid, newport)

if __name__ == '__main__':

main()

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!