Question: Python Programming Use the Python requests library andBeautiful Soup library to create a Python script that scrapes anddisplays the html links and images from the
Python Programming
Use the Python requests library andBeautiful Soup library to create a Python script that “scrapes” anddisplays the html links and images from the home page of theSmithsonian institute. (Si.org)
- Your program must write each link found to an output file named“weblinks.txt” and each image found on the home page to a file name“webimages.txt”.
- Any image or link that contains the word “art” must be writtento a file named art.txt. Make sure your programis NOT case sensitive when evaluating the word “art”.
What I have so far-
#http get a file and save in python string variable
#check http code
from bs4 import BeautifulSoup
import urllib.request, urllib.parse, urllib.error
#http response to a site
resp = urllib.request.urlopen('https://www.si.edu/')
soup = BeautifulSoup(resp,"html.parser")
#get a list of anchor tags
tags = soup('a')
print(type(tags))
for item in tags:
print (item.get('href',None))
for item in tags:
if "art" in str(item).lower():
print(item.get('href',None))
#save downloaded file to disk
try:
resp =urllib.request.urlopen('https://www.si.edu/')
bytesToWrite = resp.read()
#must write as binary to maintain unicodeformatting
myFile = open("weblinks.txt",'wb')
myFile.write(bytesToWrite)
myFile.close()
except Exception as exc:
print('An error occured.' + str(exc))
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
