Question: Here is code. #This program uses urllib to download web pages import urllib.request as req import time hosts = [http://missouri.edu/, http://www.missouristate.edu/, http://www.mssu.edu/, http://www.semo.edu/, http://umkc.edu, http://mst.edu,
Here is code.
#This program uses urllib to download web pages import urllib.request as req import time
hosts = ["http://missouri.edu/", "http://www.missouristate.edu/", "http://www.mssu.edu/", "http://www.semo.edu/", "http://umkc.edu", "http://mst.edu", "https://www.ucmo.edu/", "http://www.umsl.edu/", "http://www.truman.edu/", "http://www.nwmissouri.edu/"]
#get web pages from hosts and print first 50 characters def get_page(hosts): for host in hosts: url = req.urlopen(host) html = url.read() print(host, html[:50])
def main(): get_page(hosts)
start = time.time() main() print("Elapsed time: %s" % (time.time() - start))
I would like to know how to modify it to use a separate thread for each download. This will require that threads coordinate so different threads dont download the same web page. An easy way to do this is to use a Queue from the queue module: que = queue.Queue(). I suggest you make the queue global, although you could define it in main and pass it to get_page().
In main():
If you want the queue to be local, create it here. Either way, put the hosts into the queue (use the queues .put() method).
In a for loop, spawn a thread for each host (use for i in range(len(hosts))). I suggest you use t for each thread variable and give the thread a name such as gp + str(i). The target function is get_page(); if your queue is local to main(), pass it to the thread using args.
Start the thread.
After the loop that creates and starts the threads, wait on the queue to be fully processed (use the queues .join() method). This is probably not necessary but its good practice.
In get_page():
If the queue is global, get_page() takes no arguments. If youre passing the queue, then make it an argument to get_page().
Get a host from the queue (use the queues .get() method).
Open the url, as in web_fetch_serial.py, and read the html.
Print the name of the thread and the first 50 characters from the html.
Tell the queue the task is done (use the queues task_done() method).
Execute web_fetch_threads.py several times and note how long it usually takes. Is it faster than web_fetch_serial?
After a process starts a thread, it will go on its merry way, doing any other tasks it has, although it wont exit until all its threads have joined. However, we can force a process to block until a thread joins. Add the statement t.join() immediately after you start the thread (t.start(), assuming your using t for the thread variable name). Does this make a difference in how long the program takes to execute and, if so, why?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
