Question: In this project, you will implement a web proxy that passes requests and data between multiple web clients and web servers. This assignment will give

In this project, you will implement a web proxy that passes requests and data between multiple web clients and web servers. This assignment will give you a chance to get to know one of the most popular application protocols on the Internet -- the Hypertext Transfer Protocol (HTTP).

HTTP Proxy

Getting Started

download the starter package (included at the end of this post).

You will find the following starter code files: http_proxy.py test_scripts README.pdf

Task Specification

Your task is to build a web proxy capable of accepting HTTP requests, forwarding requests to remote (origin) servers, and returning response data to a client.

The proxy will be implemented in your preferred language, and if you're not using python, an Makefile to generate executable and a brief instruction has to be provided.

Your proxy program should run without errors. It should take as its first argument a port to listen from. Don't use a hard-coded port number. If you use any other programming language, you need to

supply a Makefile for compilation, and it should produce an executable file called http_proxy.

You shouldn't assume that your proxy will be running on a particular IP address, or that clients will be coming from a pre-determined IP address.

Dummy Proxy

Your proxy should listen on the port specified from the command line and wait for incoming client connections. Once a client has connected, the proxy should read data from the client and then check for a properly-formatted HTTP request.

The client issues a request by sending a line of text to the proxy server. This request line consists of an HTTP method (most often "GET", but "POST", "PUT", and others are possible), a request URI (like a URL), and the protocol version that the client wants to use ("HTTP/1.1"). The request line is followed by one or more header lines. The message body of the initial request is typically empty. URI

in the HTTP request to a proxy server is required to be absolute URL, so you will obtain both host name and path to the file in request line.

Your dummy proxy is only responsible for accepting the HTTP request. All requests should elicit a well-formed HTTP response with status code 501 "Not Implemented". In the body of this HTTP response, you should supply a dummpy HTML page which prints the URL requested by client and a message "You request will be forwarded."(However, it won't...unless you finish the next part).

Note: You need to parse the HTTP message, don't dump the entire message directly on the returned page.

Complete Proxy Servers

Before you start this part, make a copy of the code for the dummy proxy and named it as http_proxy_dummy.py. Then continue to work on the file http_proxy.py.

Sending Requests to Servers

Once the proxy has parsed the URL in the client request, it can make a connection to the requested host (using the appropriate remote port, or the default of 80 if none is specified) and send the HTTP request for the appropriate resource. The proxy should always send the request in the relative URL +

Host header format regardless of how the request was received from the client.

For example, if the proxy accepts the following request from a client:

GET http://www.example.com/ HTTP/1.1

It should send the following request to the remote server:

GET / HTTP/1.1

Host: www.example.com

Connection: close

(Additional client specified headers, if any...)

Note that we always send HTTP/1.1 flags and a Connection: close header to the server, so that it will close the connection after its response is fully transmitted, as opposed to keeping open a persistent connection. So while you should pass the client headers you receive on to the server, you should make sure you replace any Connection header received from the client with one specifying close, as shown.

Returning Response to Clients

After the response from the remote server is received, the proxy should send the response message as-is to the client via the appropriate socket.

For any error caught by the proxy, the proxy should return the status 500 'Internal Error'. As stated above, any request method other than GET should cause your proxy to return status 500 'Internal Error' rather than 501 'Not Implemented'. Likewise, for any invalid, incorrectly formed headers or

requests, your proxy should return status 500 'Internal Error' rather than 400 'Bad Request' to the client.

Otherwise, your proxy should simply forward status replies from the remote server to the client. This means most 1xx, 2xx, 3xx, 4xx, and 5xx status replies should go directly from the remote server to the client through your proxy. (While you are debugging, make sure that 404 status replies from the remote server are not the result of poorly forwarded requests from your proxy.)

Concurrent Requests (Optional)

A practical web proxy should be able to support multiple clients at the same time. You may choose appropriate library to add multi-thread support into your program.

Testing Your Proxy

Run your proxy with the following command:

python http_proxy.py &, where port is the port number that the proxy should listen on. As a basic test of functionality, try requesting a page using telnet:

telnet localhost

Trying 127.0.0.1...

Connected to localhost.localdomain (127.0.0.1).

Escape character is '^]'.

GET http://www.example.com/ HTTP/1.1

If your proxy is working correctly, the headers and HTML of example.com should be displayed on your terminal screen. Notice here that we request the absolute URL (http://www.example.com/) instead of just the relative URL (/). A good sanity check of proxy behavior would be to compare the HTTP response (headers and body) obtained via your proxy with the response from a direct telnet

connection to the remote server. Additionally, try requesting a page using telnet concurrently from two different shells.

Then try testing your proxy with the supplied test_proxy.py script. This will compare the result of fetching 4 pre-determined websites directly versus through your proxy:

python testing_scripts/test_proxy.py http_proxy.py [port (optional, will be random if omitted)]

(This script requires http_proxy.py to be an executable file. chmod +x http_proxy.py can be used on Linux/Mac OS X. On windows, it should be setup by python installer already.)

Things to submit make a zip file including all the files in the starter package.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

proxy.py test_scripts

#!/usr/bin/env python

import os

import random

import sys

import signal

import socket

import telnetlib

import time

import threading

import urlparse

try:

import proxy_grade_private

use_private = True

except ImportError:

use_private = False

# pub_urls - The list of URLs to compare between the proxy

# and a direct connection.

# You can create additional automated tests for your proxy by

# adding URLs to this array. This will have no effect on your

# grade, but may be helpful in testing and debugging your proxy.

# When you are testing against real web servers on the Internet,

# you may see minor differences between the proxy-fetched page and

# the regular page- possibly due to load balancing or dynamically

# generated content. If there is only a single line that doesn't

# match between the two, it is likely a product of this sort of

# variation.

# Note that since this test script compares transaction output from

# the proxy and the direct connection, using invalid URLs may

# produce unexpected results, including the abnormal termination

# of the testing script.

pub_urls = ['http://example.com/',

'http://johnjay.jjay.cuny.edu/',

'http://chujie.github.io/csci379/',

'https://chujie.github.io/csci379/test.html',

]

# timeout_secs - Individual tests will be killed if they do not

# complete within this span of time.

timeout_secs = 30.0

def main():

global pub_urls

try:

proxy_bin = sys.argv[1]

except IndexError:

usage()

sys.exit(2)

try:

port = sys.argv[2]

except IndexError:

port = str(random.randint(1025, 49151))

print 'Binary: %s' % proxy_bin

print 'Running on port %s ' % port

# Start the proxy running in the background

cid = os.spawnl(os.P_NOWAIT, proxy_bin, proxy_bin, port)

# Give the proxy time to start up and start listening on the port

time.sleep(2)

passcount = 0

for url in pub_urls:

print '### Testing: ' + url

passed = run_test(compare_url, (url, port), cid)

if not live_process(cid):

print '!!!Proxy process experienced abnormal termination during test- restarting proxy!'

(cid, port) = restart_proxy(proxy_bin, port)

passed = False

if passed:

print '%s: [PASSED] ' % url

passcount += 1

else:

print '%s: [FAILED] ' % url

if (use_private):

(priv_passed, test_count, cid) = proxy_grade_private.runall(port, cid, proxy_bin)

# Cleanup

terminate(cid)

print 'Summary: '

print '\t%d of %d tests passed.' % (passcount, len(pub_urls))

if (use_private):

print '%d of %d extended tests passed' % (priv_passed, test_count)

def usage():

print "Usage: proxy_grader.py path/to/proxy/binary port"

print "Omit the port argument for a randomly generated port."

def run_test(test, args, childid):

'''

Run a single test function, monitoring its execution with a timer thread.

* test: A function to execute. Should take a tuple as its sole

argument and return True for a passed test, and False otherwise.

* args: Tuple that contains arguments to the test function

* childid: Process ID of the running proxy

The amount of time that the monitor waits before killing

the proxy process can be set by changing timeout_secs at the top of this

file.

Returns True for a passed test, False otherwise.

'''

monitor = threading.Timer(timeout_secs, do_timeout, [childid])

monitor.start()

if not test(args):

passed = False

else:

passed = True

monitor.cancel()

return passed

def compare_url(argtuple):

'''

Compare proxy output to the output from a direct server transaction.

A simple sample test: download a web page via the proxy, and then fetch the

same page directly from the server. Compare the two pages for any

differences, ignoring the Date header field if it is set.

Argument tuples is in the form (url, port), where url is the URL to open, and

port is the port the proxy is running on.

'''

(url, port) = argtuple

urldata = urlparse.urlparse(url)

try:

(host, hostport) = urldata[1].split(':')

except ValueError:

host = urldata[1]

hostport = 80

# Retrieve via proxy

try:

proxy_data = get_data('localhost', port, url)

except socket.error:

print '!!!! Socket error while attempting to talk to proxy!'

return False

# Retrieve directly

direct_data = get_direct(host, hostport, urldata[2])

# Compare responses

return compare_responses(proxy_data, direct_data)

def compare_responses(proxy_data, direct_data, lenient_header=True):

proxy_header = proxy_data.split(" ")[0]

direct_header = direct_data.split(" ")[0]

proxy_response_line = proxy_data.split(" ")[0]

direct_response_line = direct_data.split(" ")[0]

if "200" in proxy_response_line:

proxy_body = proxy_data.split(" ")[1].split("")[-1]

else:

proxy_body = ""

if "200" in direct_response_line:

direct_body = direct_data.split(" ")[1].split("")[-1]

else:

direct_body = ""

if proxy_response_line != direct_response_line:

print "Response lines don't match: Direct: {} Proxy: {} ".format(direct_response_line, proxy_response_line)

return False

if not lenient_header:

for proxy_h in proxy_header:

if not proxy_h.startswith("Date") and not proxy_h.startswith("Expires") and not (proxy_h in direct_header):

print "Headers don't match: Direct: {} Proxy: {} ".format(direct_header, proxy_header)

return False

for direct_h in direct_header:

if not direct_h.startswith("Date") and not direct_h.startswith("Expires") and not (direct_h in proxy_header):

print "Headers don't match: Direct: {} Proxy: {} ".format(direct_header, proxy_header)

return False

if proxy_body != direct_body:

print "HTML content doesn't match: Direct: {} Proxy: {} ".format(direct_body, proxy_body)

return False

return True

def get_direct(host, port, url):

'''Retrieve a URL using direct HTTP/1.1 GET.'''

getstring = 'GET %s HTTP/1.1 Host: %s Connection: close '

data = http_exchange(host, port, getstring % (url, host))

return data

def get_data(host, port, url):

'''Retrieve a URL using proxy HTTP/1.1 GET.'''

getstring = 'GET %s HTTP/1.1 Connection: close '

data = http_exchange(host, port, getstring % url)

#return data.split(' ')

return data

def http_exchange(host, port, data):

conn = telnetlib.Telnet()

conn.open(host, port)

conn.write(data)

ret_data = conn.read_all()

conn.close()

return ret_data

def live_process(pid):

'''Check that a process is still running.'''

try:

os.kill(pid, 0)

return True

except OSError:

return False

def do_timeout(id):

'''Callback function run by the monitor threads to kill a long-running operation.'''

print '!!!! Proxy transaction timed out after %d seconds' % timeout_secs

terminate(id)

def terminate(id):

'''Stops and cleans up a running child process.'''

assert(live_process(id))

os.kill(id, signal.SIGINT)

os.kill(id, signal.SIGKILL)

try:

os.waitpid(id, 0)

except OSError:

pass

def restart_proxy(binary, oldport):

'''Restart the proxy on a new port number.'''

newport = str(int(oldport) + 1)

cid = os.spawnl(os.P_NOWAIT, binary, binary, newport)

time.sleep(3)

return (cid, newport)

if __name__ == '__main__':

main()

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Computer Networking In this project, you will implement a web proxy that passes requests and data between multiple web clients and web servers. This assignment will give you a chance to get to know...

Description Use vi to make a Python program that runs as a proxy server. The detailed requirement is attached. The code is provided i just need to complete the fill in info Socket Programming...

Part 1 : Web Proxy Server In this lab, you will learn how web proxy servers work and one of their basic functionalities, caching. Generally, when the client makes a request, the request is sent to...

CHA P TER 9 Understanding Software: A Primer for Managers 1. INTRODUCTION L E A R N I N G O B J E C T I V E S 1. Recognize the importance of software and its implications for the rm and strategic...

Project Assignment 2: Building a Multi-Threaded Web Server This project assignment is due at the end of the seventh week of the course and is worth 7% of your total grade. In this project, we will...

Use Python to write a multi-threaded Web proxy server only needs to support GET method. Description In this assignment you are asked to build a multi-threaded Web proxy server that is capable of...

This project assignment is due at the end of the seventh week of the course and is worth 7% of your total grade. In this project, we will develop a Web server in two steps. In the end, you will have...

GRADUATE CERTIFICATE IN PROJECT MANAGEMENT PROJ5010: PROJECT PROCUREMENT AND STRATEGIC SOURCING. CASE STUDIES CONTENTS 1. Proj5010: The World Bank RFP Case Study covers 1. Assignment 1: Marks = 5 2....

CIS 620/B Project Plan For the Global Collaboration Enterprise Information System 1200 12th Avenue, Suite 1200 Seattle, Washington 98114 U.S.A. November 11, 2015 Draft Version 19.0 November 11, 2015...

Read the case study Identifying the Pathways for Meaning Circulation using Text Network Analysis by Dmitry Paranyushkin and write a summarize paper about it

Barbados has prepared the following standard cost information for one unit of product zeta. 4 Kg @ S10 per Kg 2 hours @ $4 per hour $8 3 hours @S2.5 per hour $7.5 Direct Materials $40 Direct Labour...

Collateralized debt obligations ( CDOs ) consist of: Question 2 1 options: 1 ) A collection of mortgages pooled into tranches by credit quality 2 ) Unsecured corporate bonds 3 ) Treasury securities...

please show work thank you Calculate the \% ionization for BROMOTHYMOL BLUE in the following the buffers - pH6.1 - pH7.1 - pH8.1 - HClpH1.5 - NaOH pH 12 Predict the color of the solution at the...

From a Comparable Worth Standpoint, what is the situation with regard to Federal Gender-based Employee Pay Equity?

Provide an example of how drilling down further into information can yield new results.

What do Dimensions represent in OLAP Cubes?