Question: Create a program in C++ that compares text files and calculates their similarity by word. The user will be prompted to list as many text

Create a program in C++ that compares text files and calculates their similarity by word. The user will be prompted to list as many text files as they like (3, 18, 2345, etc), all separated by spaces. These files will then be compared against each other in the words used, forgetting the multiplicity of each word.

Every file should be compared against every other file exactly once, with an arbitrary number of files.

When two files are compared, all the words they had in common (intersection) should be documented. All words that were present in either file (union) should be documented. In documenting the union and intersection, the word frequency does not matter. A word appearing once or a hundred times between two files, for example, should only be documented once.

A similarity rating is defined by the percentage of overlap between the set of all words common to both files (intersection) and the set of all words, words appearing in either file (union).

For each pairwise comparison, a file should be created and saved with name [FileName]_[OtherFileName]_Sim_Score_[PERCENT].txt where [PERCENT] is the similarity score, rounded to the nearest integer, [FileName] is the name of one 1 file without any .s, and [OtherFileName] is the name of the other file without any .s.

Each saved file should be formatted as follows:

Similarity Score: [SIMILARITY SCORE]

Union: [ALL WORDS BETWEEN TWO FILES LISTED ALPHABETICALLY WITH NUMBERS COMING FIRST, SEPARATED BY SPACES]

Intersection: [ALL WORDS THAT ARE COMMON BETWEEN THE TWO FILES LISTED ALPHABETICALLY, SEPARATED BY SPACES]

The requirements:

1. You must overload at least 1 operator, but you may overload more. Hint: if you do overload operator+, the return type need not be a reference.

2. You may use std::vector and std::string, but you must also use at least one other data structure. There is one that is very natural for this setting.

3. You must assume the users files will be in the same folder as the .cpp or .exe file, and the files must be saved to the same folder.

4. You may assume the files will not use digits 0-9 nor will they use any punctuation marks.

5. The words listed in the files you produce must be listed alphabetically.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

I have to create a program in C and I can't figure it out. The program has to read a source file. Please help. /******************************************************************** PROJECT: Glossary...

** URGENT ** Finite Automata - WorkShop 19 Language - C ( -std=c99 -Wall -Werror ) Intro The purpose of this WorkShop is for you to demonstrate your understanding of the formal model of finite...

** URGENT ** Finite Automata - WorkShop 20 Language - C ( -std=c99 -Wall -Werror ) Intro The purpose of this WorkShop is for you to demonstrate your understanding of the formal model of finite...

IN JAVA The goal of this assignment is to implement a program that generates personalised lists of plants for a new Botanic Garden that you have been tasked with planning. It takes an input word...

You have to write a Python program (3.5) which take two text files. Start with the "wordfreq.py" script . As presented, it only calculates the frequency of each individual word in a text file, stored...

C++ Programming, using namespace std and iostream. You were given two text files with comma separated values: books.txt, which is a list of books and their authors, and ratings.txt, which is a list...

C++ Programming: Pointers Assignment Instructions Overview The objective of this is to demonstrate the use of pointers in a program utilizing c-strings and tokenization. The use of pointers is...

Lesson 12 Quiz (Show/Explain all Work) IST 230 Relations on Sets, Databases 1. Let A = {0, 1, 2, 3, 4, 5, 6, 7, 8} and B = {1, 2, 3, 4, 5, 6, 7, 8}. Now let R be a binary relation R from A to B such...

in C++ A Mad Lib is a phrasal template word game in which a player is prompted for a list of words to substitute into blanks in a pre-written short story. The story, with the blanks filled in, is...

The Small Business Investing Scholars Program is intended to design an inclusive industry by promoting gender, racial, and ethnic diversity in private equity. This 8-week program will offer top...

On November 1, 201X, Barbie Riley opened Barbies Art Studio. The following transactions occurred in November: 201X Nov. 1 Barbie Riley invested $6,000 in the art studio. 1 Paid three months rent in...

why the post bellow is important What is strategic financial management, and how do financial leaders perceive its importance? Financial management on its own is concerned with raising the funds...

THADDEUS STAMIC FIELD & 04/11/21 8:1 Homework: Chapter 8 homework part 2 2 of 3 (1 complete) core: 0 of 1 pt M8-31B (book/static) HW Score. 9.09% 0.27 0 Quacson Help Cold Sports manufactures...

5. [From a citizen] Can you tell me who to talk with about a junk car that Id like to have removed from the vacant lot next door? a. Thats not my department. b. Ask the secretary over there. c. That...

2. While this case clearly involves power relationships, is there anything else that might be going on? If you think other factors are contributing to Anitas problems, which of these factors can...

2. [From a coworker] Im terribly afraid that the mistake I made in the Harris confirmation hearing means that the boss is going to transfer me to the end of the world. a. Thats stupid. He hates...