Question: Python 3: Develop a crawler that collects the email addresses in the visited web pages. You can use function emails() from Problem 11.22 to find

Python 3: Develop a crawler that collects the email addresses in the visited web pages. You can use function emails() from Problem 11.22 to find email addresses in a web page. To get your program to terminate, you may use the approach from Problem 11.15 or Problem 11.17.

(((For context, problem 11.22 is as follows: Write function emails() that takes a document (as a string) as input and returns the set of email addresses (i.e., strings) appearing in it. You should use a regular expression to find the email addresses in the document. >>> from urllib.request import urlopen >>> url = 'http://www.cdm.depaul.edu' >>> content = urlopen(url).read().decode() >>> emails(content) {'advising@cdm.depaul.edu', 'wwwfeedback@cdm.depaul.edu', 'admission@cdm.depaul.edu', 'webmaster@cdm.depaul.edu'}

Problem 11.15 is as follows: Modify the crawler function crawl1() so that the crawler does not visit web pages that are more than n click (hyperlinks) away. To do this, the function should take an ad- ditional input, a nonnegative integer n. If n is 0, then no recursive calls should be made. Otherwise, the recursive calls should pass n 1 as the argument to the crawl1() function.

Python 3: Develop a crawler that collects the email addresses in the

Problem 11.17 is as follows: Modify the crawler function crawl2() so that the crawler only follows links hosted on the same host as the starting web page.

visited web pages. You can use function emails() from Problem 11.22 to)))

1 def crawl1 (url): "recursive web crawler that calls analyze () on every web page' # analyze () returns a list of hyperlink URLs in web page url links = analyze (url) 10 11 12 # recursively continue crawl from every link in links for link in links: try : # try block because link may not be valid HTML file crawl1 (link) except: # if an exception is thrown, pass # ignore and move on

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!