Question: For Python 3.6.1: Implement function getContent() that takes as input a URL (as a string) and prints only the text data content of the associated
For Python 3.6.1: Implement function getContent() that takes as input a URL (as a string) and prints only the text data content of the associated web page (i.e., no tags). Avoid printing blank lines that follow a blank line and strip the whitespace in every line printed. Please only urllib imports and parsers. I posted the code that I have tried, and I keep getting a very long error message. I am using Mac OSX Sierra, if that matters...I have seen other solutions here that seem to work fine on Windows machines, (including my friends' machines who use Windows.) Thank you for your help!
>>> getContent('http://www.nytimes.com/')
The New York Times - Breaking News, World News & Multimedia
Subscribe to The Times
Log In
Register Now
Home Page
...
--------
I have tried this:
from urllib.request import urlopen from html.parser import HTMLParser
class MyHTMLParser(HTMLParser): def handle_data(self, data): print(data)
def getContent(url): response = urlopen(url) content = response.read() parser = MyHTMLParser() return parser.feed(content)
print(getContent('http://www.nytimes.com/'))
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
