Question: I am writing a program in Python to count words in a text file, and compare two different text files to determine the common words
I am writing a program in Python to count words in a text file, and compare two different text files to determine the common words found in both files. This is what I have written so far:
filename1 = "declarationOfInd.txt" filename2 = "gettysBurg.txt"
file1 = open(filename1, "r") document1 = file1.read().lower() punctuations = (",.';:/-") Doc1 = " " for char in document1: if char not in punctuations: Doc1 = Doc1 + char print(Doc1)
file2 = open(filename2, "r") document2 = file2.read().lower() punctuations = (",.';:/-") Doc2 = " " for char in document2: if char not in punctuations: Doc2 = Doc2 + char print(Doc2)
def wordCount(Doc1): return wordcount={} for word in Doc1.split(): if word not in wordcount: wordcount[word] = 1 else: wordcount[word] += 1 print(word,wordcount)
file1.close() file2.close()
Now that I have done the wordcount, I need to change Doc1 and Doc2 into sets, so that I can compare the files using set operations, such as intersection and union. However, I am having trouble converting the words from the string format in Doc1 and Doc2 into sets. In addition, only words that are of length 4 or greater should be included in the set. Can you help me convert words that are length 4 or greater from Doc1 and Doc2 into sets?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
