Question: The goal of this assignment is to write a program that will scan a web page and harvest as many email addresses as possible. Many
The goal of this assignment is to write a program that will scan a web page and harvest as many email addresses as possible. Many of these email address will be obfuscated in some way. You're job is to get the computer to figure out how to recognize the obfuscation and return a good result!
Here are some examples to get you started (in the form obfuscated email => what your program should interpret the email as):
mst3k@Virginia.EDU => mst3k@Virginia.EDU
thomas.jefferson@cs.virginia.edu => thomas.jefferson@cs.virginia.edu
mst3k at virginia.edu => mst3k@virginia.edu
mst3k at virginia dot edu => mst3k@virginia.edu
You can come up with regular expressions that will look for particular patterns in a line that could be an email address.
Your program must implement the following function:
find_emails_in_website(url): This function takes as input a string representation of the URL of a website that you want to search. We have a page https://cs1110.cs.virginia.edu/emails.html that has a set of example emails you should be able to find (and some that you can look for but we are not requiring). This function should return a list of all of the valid email addresses that you find
This is what I have so far, but for some reason is not working
import urllib.request import re stream = urllib.request.urlopen( "https://cs1110.cs.virginia.edu/emails.html" ) for line in stream: decoded = line.decode("UTF-8") print(decoded.strip()) xx=re.search(r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+', re.IGNORECASE) xx.findall(stream) Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
