Question: Must be done in Python 3 in Jupyter Notebook using Pandas Link to privacy.html needed for this question: https://mega.nz/#!Ur4BUKoJ!kelCZAdSDUv6tIltswcvNsh8KehhXkME03ZO-Zv_Vns Part 1: Install bs4/ BeautifulSoup ,

Must be done in Python 3 in Jupyter Notebook using Pandas

Link to privacy.html needed for this question: https://mega.nz/#!Ur4BUKoJ!kelCZAdSDUv6tIltswcvNsh8KehhXkME03ZO-Zv_Vns

Part 1:

Install bs4/BeautifulSoup, and give it a try on extracting just the text (and not the html) from the file privacy.html. This file is a simple web server landing page. Think of it as containing just a long string of characters. If you look at it in a text editor you'll see a lot of html tags. Share your code and your results.

Part 2:

Use Deldycke's html tag regex (link here: https://kevin.deldycke.com/2008/07/python-ultimate-regular-expression-to-catch-html-tags/ (Links to an external site.)) or another expression that you like better, with Pandas or by just using Python to strip out all the html from privacy.html. What's in this file is just a long string of characters, as mentioned above. Share your code and your results.

Part 3: Can you find other Python packages that think might be more useful or easier to use than the above?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!