Question: This is a PYTHON question. I do not know anyother language. Problem: Make a getContent function, that returns the text-content of a webpage. online the

This is a PYTHON question. I do not know anyother language.

Problem: Make a getContent function, that returns the text-content of a webpage. online the text no tags

from urllib.request import urlopen def getSource(url): 'returns the content of resource specified by url as a string' response = urlopen(url) html = response.read() return html.decode()

from html.parser import HTMLParser class LinkParser(HTMLParser): '''HTML doc parser that prints values of href attributes in anchor start tags'''

def handle_starttag(self, tag, attrs): 'print value of href attribute if any' if tag == 'a': # if anchor tag

# search for href attribute and print its value for attr in attrs: if attr[0] == 'href': print(attr[1])

from urllib.parse import urljoin from html.parser import HTMLParser class Collector(HTMLParser): 'collects hyperlink URLs into a list'

def __init__(self, url): 'initializes parser, the url, and a list' HTMLParser.__init__(self) self.url = url self.links = []

# Solution to practice problem 11.3 self.text = '' def handle_starttag(self, tag, attrs): 'collects hyperlink URLs in their absolute format' if tag == 'a': for attr in attrs: if attr[0] == 'href': # construct absolute URL absolute = urljoin(self.url, attr[1]) if absolute[:4] == 'http': # collect HTTP URLs self.links.append(absolute) # Solution to practice problem 11.3 def handle_data(self, data): 'collects and concatenates text data' self.text += data

def getLinks(self): 'returns hyperlinks URLs in their absolute format' return self.links

# Solution to practice problem 11.3 def getData(self): 'returns the concatenation of all text data' return self.text

from urllib.request import urlopen def analyze(url): '''prints the frequency of every word in web page url and prints and returns the list of http links, in absolute format, in it''' print(' Visiting', url) # for testing

# obtain links in the web page content = urlopen(url).read().decode() collector = Collector(url) collector.feed(content) urls = collector.getLinks() # get list of links

# compute word frequencies content = collector.getData() # get text data as a string freq = frequency(content)

# print the frequency of every text data word in web page print(' {:45} {:10} {:5}'.format('URL', 'word', 'count')) for word in freq: print('{:45} {:10} {:5}'.format(url, word, freq[word]))

# print the http links found in web page print(' {:45} {:10}'.format('URL', 'link')) for link in urls: print('{:45} {:10}'.format(url, link))

return urls

def crawl1(url): 'recursive web crawler that calls analyze() on every web page'

# analyze() returns a list of hyperlink URLs in web page url links = analyze(url)

# recursively continue crawl from every link in links for link in links: try: # try block because link may not be valid HTML file crawl1(link) except: # if an exception is thrown, pass # ignore and move on.

visited = set() # initialize visited to an empty set

def crawl2(url): '''a recursive web crawler that calls analyze() on every visited web page'''

# add url to set of visited pages global visited # while not necessary, warns the programmer visited.add(url)

# analyze() returns a list of hyperlink URLs in web page url links = analyze(url)

# recursively continue crawl from every link in links for link in links: # follow link only if not visited if link not in visited: try: crawl2(link) except: pass

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Python and most Python libraries are free to download or use, though many users use Python through a paid service. Paid services help IT organizations manage the risks associated with the use of...

Al-Driven Contextual Advertising: Toward Relevant Messaging Without Personal Data E. Haglund and J. Bjorklund Department of Computing Science, Umea University, Umed, Sweden ABSTRACT In programmatic...

CHAPTER 1 THE BUSINESS AND SOCIETY RELATIONSHIP BUSINESS & SOCIETY Title ISBN Business and Society Archie B. Carroll; Ann K. Buchholtz 978-1-285-73429-3 Publisher Cengage Learning Author FOCUS OF THE...

The code for all regressions is to be written in Python language in the Jupyter Notebooks environment . The dataset weibo_data.csv contains the columns: a) location, b) show_id, c) episode_num, d)...

PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON Question 1: Dictionaries contain key-value pairs. Values are like a lists indexes. True False Question 2: Dictionaries do not consider...

PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON Question 1 Which symbols are used to indicate a list of values in Python? { } | | / / [ ] Question 2 Given the listnames[Robert, Ron, Bob]how...

Question 1 . How to Create a New File in Python? Question 2 . How to Open the File in Python? Question 3 . How to Read a file in Python? Question 4 . How to Close a file in Python Question 5 . How to...

How do you convert the following variable in Python so that it prints out A as 8 . 0 ? A = 8 Question 1 options: print ( float ( A ) ) print ( decimal ( A ) ) print ( 8 . 0 ) print ( float ( a ) )...

python question using thonny the last picture is the code that was provided thats all the info i have please try to answer my question thank you COMP 208 Winter 2020 Assignment_2_rev3.pdf (page 10 of...

According to experimental results for parallel airflow over a uniform temperature, heated vertical plate, the effect of free convection on the heat transfer convection coefficient will be 5% when...

When Mr. Sylvain Laprade, president and chief executive of Laprade, Inc., first saw the segmented income statement below, he flew into his usual rage: "When will we ever start showing a real profit?...

Which of the following is commonly used to measure inflation? Multiple choice question. The CBOE Volatility Index The Case - Schiller Index The TSX Composite Index The Consumer Price Index

Alex Fernandez, VP Sales of Cyngagae Ltd has difficult times with the non performance of the sales force in terms of the achievement of the company sales objectives. An exploratory study emanates the...

How would redundant storage of values in columns in a Table be eliminated?

What do Primary Keys, along with Third Normal Form Design in a Database Model, achieve?

Describe Table Structures in RDMSs.