Question: from urllib.request import urlopen, urljoin from html.parser import HTMLParser from re import findall class Collector(HTMLParser): def _-init-(self): initializes parser, the url, and a list HTMLParser.__init

from urllib.request import urlopen, urljoin from html.parser import HTMLParser from re import findall class Collector(HTMLParser): def _-init-(self): initializes parser, the url, and a list HTMLParser.__init (self) self. linksset() - # Solution to Practice Problem 11.3 self.text def handle_starttag(self, tag, attrs): #print(tag) collects hyperlink URLs in their absolute format if tag = a: for attr in attrs: if att r [0] = href: self. links.add (attr[1]) for a in. def getLinks (content): parserCollector) parser.feed (content) newLink = parser. links #https = [] #for link in newLink: #if link = href: Write a function getWebInfo( ) that takes as input a URL and calls three functions, to print the following information: 1) The set of all absolute links already in the page, that is links that start with 'http://'. Must use HTMLparser class methods. Do not copy code from the book, which is using urljoin to *make* every link absolute. (6 points) 2) A set that contains all e-mail addresses appearing in the page. Must use regular expressions to detect e-mail addresses on the web page. Must remove duplicates. E-mails should be matching general e-mails, not just depaul.edu emails; do not use cdm.depaul.edu in your pattern. Your program should work for any e-mail address on any web page. (6 points) 3) A list of tuples (derived from a dictionary) that contains the 20 most frequent words and their frequencies, in order of frequency. Words must contain 5 or more characters. Discard any words of 4 characters or less. (6 points) There are several steps to follow on this part. a) You need to construct a dictionary first, containing words and their frequencies. b) Then the dictionary has to be reversed. c) The reversed dictionary has to be sorted. Please note that the sorting method returns a list of tuples. d) Print the first 20 tuples of the list of tuples. Write one function for each of the three pieces of information, a total of three functions. Then assemble the three function calls (and headings) inside your main function getWebInfo( ). Include the call to getWebInfo at the bottom of your module (file.) (2 points) Function Calls getWebInfo ('http://www.cdm.depaul.edu/') getWebInfo ('http://www.cs.uiowa.edu/') In python, use from html.parser import HTMLParser and print absolute links

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Write a function getWebInfo( ) that takes as input a URL and prints the following information: 1) A list of all absolute links that are already in the page, that is links that start with 'http://' or...

Write a custom parser named ListParser. It will find any list element and print of the values to the screen. Note. You should remove any blank list items. from html.parser import HTMLParser from...

Python Please use the template below: from html.parser import HTMLParser from urllib.parse import urljoin from urllib.request import urlopen def testImgParser(url): content =...

use python language to complete the Titleparser part and the BizzaroGUI part 1. Finish the BizzaroGUI implementation. The BizzaroGUI does three things: Has a text box you can enter a URL and retrieve...

This is a PYTHON question. I do not know anyother language. Problem: Make a getContent function, that returns the text-content of a webpage. online the text no tags from urllib.request import urlopen...

python language complete the code Write class AnchorParser that is a subclass of the HTML Parser class. It will find and collect the contents of all the link descriptions in an HTML file fed to it....

Python - Please use the template below from html.parser import HTMLParser from urllib.parse import urljoin from urllib.request import urlopen def testLParser(url): content =...

Python 1. Finish the BizzaroGUI implementation. The BizzaroGUI does three things: a. Has a text box you can enter a URL and retrieve the title of the web page. b. Enter a folder name and retrieve the...

Using Python 3+ 1. Include the following imports at the top of your module (hopefully this is sufficient): from web import LinkCollector # make sure you did 1 from html.parser import HTMLParser from...

The degree of company financial risk is measured and reported by independent rating agencies such as Standard & Poors and Moodys. What factors do these rating agencies evaluate when determining a...

Unit operation (b) in Table 5.4 is to be heated by injecting live steam directly into the bottom plate of the column instead of by using a reboiler, for a separation involving ethanol and water....

when is there an arbitrage opportunity with future stock exchange

Una mezcla de partculas slidas de slice (B) y galena (A) que tienen un rango de tamao de 1,27 x 10-6 m a 2,50 x 10-5 m debe separarse por clasificacin hidrulica utilizando sedimentacin libre....

How do Data Types perform data validation?

How does Referential Integrity work?

What are Dimensional Relational Databases designed to hold primarily?