Question: Using python!!!! 1. Copy the file web.py from class (or class notes) into your working folder. 2. Include the following imports at the top of

Using python!!!!

1. Copy the file web.py from class (or class notes) into your working folder.

2. Include the following imports at the top of your module (hopefully this is sufficient):

from web import LinkCollector # make sure you did 1 from html.parser import HTMLParser from urllib.request import urlopen from urllib.parse import urljoin from urllib.error import URLError

3. Implement a class ImageCollector. This will be similar to the LinkCollector, given a string containing the html for a web page, it collects and is able to supply the (absolute) urls of the images on that web page. They should be collected in a set that can be retrieved with the method getImages (order of images will vary). Sample usage:

>>> ic = ImageCollector('http://www2.warnerbros.com/spacejam/movie/jam.htm') >>> ic.feed( urlopen('http://www2.warnerbros.com/spacejam/movie/jam.htm').read().decode()) >>> ic.getImages() {'http://www2.warnerbros.com/spacejam/movie/img/p-sitemap.gif', , 'http://www2.warnerbros.com/spacejam/movie/img/p-jamcentral.gif'}

>>> ic = ImageCollector('http://www.kli.org/') >>> ic.feed( urlopen('http://www.kli.org/').read().decode())

>>> ic.getImages() {'http://www.kli.org/wp-content/uploads/2014/03/KLIbutton.gif', 'http://www.kli.org/wp-content/uploads/2014/03/KLIlogo.gif'}

4. Implement a class ImageCrawler that will inherit from the Crawler developed in amd will both crawl links and collect images. This is very easy by inheriting from and extending the Crawler class. You will need to collect images in a set. Hint: what does it mean to extend? Implementation details:

a. You must inherit from Crawler. Make sure that the module web.py is in your working folder and make sure that you import Crawler from the web module.

b. __init__ - extends Crawlers __init__ by adding an set attribute that will be used to store images

c. Crawl extends Crawlers crawl by creating an image collector, opening the url and then collecting any images from the url in the set of images being stored. I recommend that you collect the images before you call the Crawlers crawl method.

d. getImages returns the set of images collected

>>> c = ImageCrawler() >>> c.crawl('http://www2.warnerbros.com/spacejam/movie/jam.htm',1,True) >>> c.getImages() {'http://www2.warnerbros.com/spacejam/movie/img/p-lunartunes.gif', 'http://www2.warnerbros.com/spacejam/movie/cmp/pressbox/img/r-blue.gif'}

>>> c = ImageCrawler() >>> c.crawl('http://www.pmichaud.com/toast/',1,True) >>> c.getImages() {'http://www.pmichaud.com/toast/toast-6a.gif', 'http://www.pmichaud.com/toast/toast-2c.gif', 'http://www.pmichaud.com/toast/toast-4c.gif', 'http://www.pmichaud.com/toast/toast-6c.gif', 'http://www.pmichaud.com/toast/ptart-1c.gif', 'http://www.pmichaud.com/toast/toast-7b.gif', 'http://www.pmichaud.com/toast/krnbo24.gif', 'http://www.pmichaud.com/toast/toast-1b.gif', 'http://www.pmichaud.com/toast/toast-3c.gif', 'http://www.pmichaud.com/toast/toast-5c.gif', 'http://www.pmichaud.com/toast/toast-8a.gif'}

5. Implement a function scrapeImages: Given a url, a filename, a depth, and Boolean (relativeOnly)., this function starts at url, crawls to depth, collects images, and then writes an html document containing the images to filename. This is not hard, use the ImageCrawler from the prior step. For example:

>>> scrapeImages('http://www2.warnerbros.com/spacejam/

movie/jam.htm','jam.html',1,True) >>> open('jam.html').read().count('img') 62

>>> scrapeImages('http://www.pmichaud.com/toast/', 'toast.html',1,True) >>> open('toast.html').read().count('img') 11

link to web.py https://www.dropbox.com/s/obiyi7lnwc3rw0d/web.py?dl=0

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!