Question: I am writing a program in Python to count words in a text file, and compare two different text files to determine the common words

I am writing a program in Python to count words in a text file, and compare two different text files to determine the common words found in both files. This is what I have written so far:

filename1 = "declarationOfInd.txt" filename2 = "gettysBurg.txt"

file1 = open(filename1, "r") document1 = file1.read().lower() punctuations = (",.';:/-") Doc1 = " " for char in document1: if char not in punctuations: Doc1 = Doc1 + char print(Doc1)

file2 = open(filename2, "r") document2 = file2.read().lower() punctuations = (",.';:/-") Doc2 = " " for char in document2: if char not in punctuations: Doc2 = Doc2 + char print(Doc2)

def wordCount(Doc1): return wordcount={} for word in Doc1.split(): if word not in wordcount: wordcount[word] = 1 else: wordcount[word] += 1 print(word,wordcount)

file1.close() file2.close()

Now that I have done the wordcount, I need to change Doc1 and Doc2 into sets, so that I can compare the files using set operations, such as intersection and union. However, I am having trouble converting the words from the string format in Doc1 and Doc2 into sets. In addition, only words that are of length 4 or greater should be included in the set. Can you help me convert words that are length 4 or greater from Doc1 and Doc2 into sets?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

(Make own text files to draw from) Objectives: Warm up with C programming. . Practice with data structures. . Practice with dynamic memory allocation and pointers. Learn some system calls and...

10-K Ford Motor Company Review Ford Motor Company's Form 10-K for 2012. Explain the purpose of a company?s 10-K and how it interprets the firm?s financial strength. Write a description of three...

Introduction Visual Basics In this assignment, you will be writing a simple text editor program. The program will be implemented with Windows forms and allow the user to create, open, and save new...

(PYTHON) PLEASE FOLLOW TEMPLATE PROVIDED IN THE BOTTOM Your task is to write a Python program that opens and reads a very large text file. The program prompts the user to enter the file name. The...

Speed Reader It would be wonderful to flip through a book and read it as quickly as the pages whizzed by like in a movie. Since the 50s there has been much research into speed reading without loss of...

CSE 271: Object-Oriented-Programming Project #1: Files and Arrays Preliminaries: Familiarize yourself with palindromes (words that read the same forward and backward), such as RACECAR See, for...

You have been hired by a genome lab to write a java program that will read in information from a text file and produce specific results to the screen and an output file. Concepts Arrays of objects...

There is one test file on the website HW2-HungerGames_edit.txt that contain the full text from Hunger Games Book 1. We have pre-processed the file to remove all punctuation and down-cased all words....

Use the given code in: ALL Programs must use good programming style ( indenting , variable naming ) Place the following Comment section at the top of the program. / / Name: your name ? ? ? Course:...

data1.txt:https://www.cse.msu.edu/~cse231/Online/Labs/Lab09/data1.txt data2.txt:https://www.cse.msu.edu/~cse231/Online/Labs/Lab09/data2.txt...

Develop a program specification for Module 4.2.5 (Calculate Dealer Cost) in minicase 1.

age Creator Company processes a raw material that produces two joint products - Magna and Delta. At split-off, Magna can be sold for $4 per pound and Delta can be sold for $6-per pound. It costs $180...

Problem 7-57 Contribution Margin Ratio, Break-Even Sales, Operating Leverage Elgart Company produces plastic mailboxes. The projected income statement for the coming year follows: Sales $460,300...

Questions Q1. Write a Python program to retrieve the first and last colors from the following list: color_list = ["red", "green", "white", "blue", "black") Q2. Given the following dictionary,...

14. Macroeconomic data do not show a strong correlation between investment and interest rates. Lets examine why this might be so. Use our model in which the interest rate adjusts to equilibrate the...

1. Which of the following is not part of the money supply? a. the metal coins in your pocket b. the paper currency in your wallet c. the balances in your retirement account d. the funds in your...

3. If a central bank wants to increase the money supply, it can bonds in open-market operations or reserve requirements. a. buy, increase b. buy, decrease c. sell, increase d. sell, decrease