Question: The goal of this assignment is to write a program that will scan a web page and harvest as many email addresses as possible. Many

The goal of this assignment is to write a program that will scan a web page and harvest as many email addresses as possible. Many of these email address will be obfuscated in some way. You're job is to get the computer to figure out how to recognize the obfuscation and return a good result!

Here are some examples to get you started (in the form obfuscated email => what your program should interpret the email as):

mst3k@Virginia.EDU => mst3k@Virginia.EDU

thomas.jefferson@cs.virginia.edu => thomas.jefferson@cs.virginia.edu

mst3k at virginia.edu => mst3k@virginia.edu

mst3k at virginia dot edu => mst3k@virginia.edu

You can come up with regular expressions that will look for particular patterns in a line that could be an email address.

Your program must implement the following function:

find_emails_in_website(url): This function takes as input a string representation of the URL of a website that you want to search. We have a page https://cs1110.cs.virginia.edu/emails.html that has a set of example emails you should be able to find (and some that you can look for but we are not requiring). This function should return a list of all of the valid email addresses that you find

This is what I have so far, but for some reason is not working

import urllib.request import re stream = urllib.request.urlopen( "https://cs1110.cs.virginia.edu/emails.html" ) for line in stream: decoded = line.decode("UTF-8") print(decoded.strip()) xx=re.search(r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+', re.IGNORECASE) xx.findall(stream) 

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!