Question: FINDING SIMILARITIES BETWEEN TEXT FILES C++ This assignment is focused on building familiarity with streams, operator overloading, and data structures. There are some requirements to

FINDING SIMILARITIES BETWEEN TEXT FILES C++

This assignment is focused on building familiarity with streams, operator overloading, and data structures. There are some requirements to follow in terms of how the program is supposed to behave and a checklist of requirements, but most of the logic and how you organize your file(s) is up to you. In this assignment, you are to create a simple vocabulary comparison tool. The user will be prompted to list as many files as they like ( could be 2, 3, 4, 10, etc.), all separated by spaces. These files will then be compared against each other in the words used, neglecting multiplicity of each word. A rough procedure is below:

Every file should be compared against every other file exactly once. Keep in mind there can be an arbitrary number of files!

When two files are compared, all the words they had in common (intersection) should be documented; also, all words that were present in either file (union) should be documented. In documenting the union and intersection, the word frequency does not matter! A word appearing once or a hundred times between two files, for example, should only be documented once.

The words should be stored as all lowercase, so all uppercase letters should be made lowercase.

The words should be stored without any punctuation marks, which we will assume are among the list: . , ! ; ? : .

A similarity rating is defined by the percentage of overlap between the set of all words common to both files (intersection) and the set of all words, words appearing in either file (union).

For each pairwise comparison, a file should be created and saved with name [FileName] [OtherFileName] Sim Score [PERCENT].txt where [PERCENT] is the similarity score, rounded to the nearest integer, [FileName] is the name of one 1 file without any .s, and [OtherFileName] is the name of the other file without any .s.

Each saved file should be formatted as below:

Similarity Score: [SIMILARITY SCORE]

Union: [ALL WORDS BETWEEN TWO FILES LISTED ALPHABETICALLY WITH NUMBERS COMING FIRST, SEPARATED BY SPACES]

Intersection: [ALL WORDS THAT ARE COMMON BETWEEN THE TWO FILES LISTED ALPHABETICALLY, SEPARATED BY SPACES]

The requirements: 1. You must overload at least 1 operator, but you may overload more. Hint: if you do overload operator+, the return type need not be a reference.

2. You may use std::vector and std::string, but you must also use at least one other data structure, set.

3. You must assume the users files will be in the same folder as the .cpp or .exe file, and the files must be saved to the same folder.

4. You may assume the files will not use digits 0-9 nor will they use any punctuation not appearing in the list previously provided.

5. You may assume punctuation marks will only ever be found immediately preceding a word or immediately after a word, with no white space between the word and the punctuation.

6. The words listed in the files you produce must be listed alphabetically.

7. You must carefully document all functions that you write. As a simple example with only 2 files, consider:

File1.txt: Today is Thursday the first.

File2.txt: If today is Thursday, tomorrow is Friday!

There should then be a file generated called File1txt_File2txt_Sim_Score_38.txt that reads:

Similarity Score: 38

Union: first friday if is the thursday today tomorrow

Intersection: is thursday today

Explanation: All of the words are: first, friday, if, is, the, thursday, today, tomorrow 8 words The common words are: is, thursday, today 3 words The similarity score would be 38 (from 3/8 = 37.5% being rounded to 38%).

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

I have to create a program in C and I can't figure it out. The program has to read a source file. Please help. /******************************************************************** PROJECT: Glossary...

Write c++ programming!! districts.txt Barryland,1,5,7 Rabbitville,1,55,12,2,654,0,3,79,711 Jelly Bean Forest,1,11,49,2,337,99,3,764,64091,4,79666,22278,5,116364,56350 Earth,1,0,1,2,45,67 New...

Mates Rates Rent-A-Car ( just do the part a) using visual studio code (C#) Criteria sheet - Par A Example supplementary files (readme.pdf) Example supplementary files (class-diagram.pdf) Assignment...

4 easy accounting questions and a comfortable due date. Sorry I can't offer any more tutor credit. Thanks in advance! :-) Question 1: A few years ago, a publishing company in the fourth quarter had a...

Lesson 12 Quiz (Show/Explain all Work) IST 230 Relations on Sets, Databases 1. Let A = {0, 1, 2, 3, 4, 5, 6, 7, 8} and B = {1, 2, 3, 4, 5, 6, 7, 8}. Now let R be a binary relation R from A to B such...

OL 325: Final Project Guidelines and Rubric Overview Acting as a recently hired compensation consultant, you will assist the burgeoning online music firm e-sonic to develop an internally consistent...

MKT500 Week 8 Discussion Board 8 attached are the questions, scenario, and supportive reading materials. MKT500 DB 8 "The Importance of Social Media and Web Analytics" 1.) From the scenario,...

Case Study: MANAGING DIVERSITY IN THE HOTEL INDUSTRY : THE CASE OF YOGYAKARTA, INDONESIA Dr. James J. Spillane, S.J. I. INTRODUCTION One of the major developments in the global economy during the...

COMP 5421-BB Assignment 1 Due 30 May 2025 Contents 1 Introduction 1.1 Heads-up 2 Objective NN 2.1 Functional Requirements . . 3 Sample Program Run CO 4 Class Specifications 4.1 Token Class 4.2...

I would like assistance with assignment 3 and 4 on the attached document I have been struggling with the subject and its my last AUI4863/102/0/2016 Tutorial letter 102/0/2016 ADVANCED INTERNAL AUDIT...

When a business borrows money from a bank on a non-interest-bearing note, how are the bank discount and proceeds calculated?

For the final assembly of a product in manufacturing company, a certain component is required. The company has the option either to produce the component itself or purchase it from the market. The...

Real estate finance, double check answers to questions 2 a , 2 b , 2 c

Suppose Capital One is advertising a 60-month, 5.88 % APR motorcycle loan. If you need to borrow $8,400 to purchase your dream Harley-Davidson, what will be your monthly payment? (Note: Be careful...

=+j Explain the current role and increasing professionalization of the IHRM manager.

=+Describe the ways in which the IHRM department will obtain more involvement in the MNE.

=+5 How does HRM relate to efforts to increase innovation?