Question: In Python, using the simple regular expression: the program replaces names from an input file by a string **name** and saves that in an output

In Python, using the simple regular expression: the program replaces names from an input file by a string "**name**" and saves that in an output file. Improve that regular expression, for e.g., add optional middle name/initial, suffixes, more prefixes etc. Multiple regular expressions can be used. The program need not be perfect, i.e. it need not cover every possible way a name can be written, it may miss some names and may incorrectly replace something that is not a name, but it should do a reasonable job. Assume that a name always starts with a prefix. Next, add code to replace email addresses by the string "**email**"

The code I have so far...the bold needs to be completed. Thank you!!

# This program removes names and email addresses occurring in a given input file and saves it in an output file.

import re def deidentify(): infilename = input("Give the input file name: ") outfilename = input("Give the output file name: ")

infile = open(infilename,"r") text = infile.read() infile.close()

# replace names nameRE = "(Ms\.|Mr\.|Dr\.|Prof\.) [A-Z](\.|[a-z]+) [A-Z][a-z]+" # improve this regular expression deidentified_text = re.sub(nameRE,"**name**",text)

# replace email addresses # complete this

outfile = open(outfilename,"w") print(deidentified_text, file=outfile) outfile.close()

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!