Question: I am writing a program in Python to count words in a text file, and compare two different text files to determine the common words

I am writing a program in Python to count words in a text file, and compare two different text files to determine the common words found in both files. This is what I have written so far:

filename1 = "declarationOfInd.txt" filename2 = "gettysBurg.txt"

file1 = open(filename1, "r") document1 = file1.read().lower() punctuations = (",.';:/-") Doc1 = " " for char in document1: if char not in punctuations: Doc1 = Doc1 + char print(Doc1)

file2 = open(filename2, "r") document2 = file2.read().lower() punctuations = (",.';:/-") Doc2 = " " for char in document2: if char not in punctuations: Doc2 = Doc2 + char print(Doc2)

def wordCount(Doc1): return wordcount={} for word in Doc1.split(): if word not in wordcount: wordcount[word] = 1 else: wordcount[word] += 1 print(word,wordcount)

file1.close() file2.close()

Now that I have done the wordcount, I need to change Doc1 and Doc2 into sets, so that I can compare the files using set operations, such as intersection and union. However, I am having trouble converting the words from the string format in Doc1 and Doc2 into sets. In addition, only words that are of length 4 or greater should be included in the set. Can you help me convert words that are length 4 or greater from Doc1 and Doc2 into sets?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!