Question: For Python 3.6.1: Implement function getContent() that takes as input a URL (as a string) and prints only the text data content of the associated

For Python 3.6.1: Implement function getContent() that takes as input a URL (as a string) and prints only the text data content of the associated web page (i.e., no tags). Avoid printing blank lines that follow a blank line and strip the whitespace in every line printed. Please only urllib imports and parsers. I posted the code that I have tried, and I keep getting a very long error message. I am using Mac OSX Sierra, if that matters...I have seen other solutions here that seem to work fine on Windows machines, (including my friends' machines who use Windows.) Thank you for your help!

>>> getContent('http://www.nytimes.com/')

The New York Times - Breaking News, World News & Multimedia

Subscribe to The Times

Log In

Register Now

Home Page

...

--------

I have tried this:

from urllib.request import urlopen from html.parser import HTMLParser

class MyHTMLParser(HTMLParser): def handle_data(self, data): print(data)

def getContent(url): response = urlopen(url) content = response.read() parser = MyHTMLParser() return parser.feed(content)

print(getContent('http://www.nytimes.com/'))

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!