Question: C++ Program Document Indexing : Your first task is to create a so-called document-term matrix from a number of input documents. You have to build

C++ Program

Document Indexing : Your first task is to create a so-called document-term matrix from a number of input documents. You have to build a dictionary, which is a set of all words that appear across all documents. To build the dictionary: (i) process all input documents; (ii) split the input text into tokens using whitespace and punctuation; and (iii) convert all characters to lower case. Count the occurrence of each token per document and print the results in a table sorted by token, like this:

C++ Program Document Indexing : Your first task is to create a

Input. Your program has to first read a configuration file (e.g., index.txt) that specifies which documents to index. Each line in this file contains the filename of a document. That is, you have open each document listed in this configuration file and add it to your document-term matrix.

Processing. You have to compute two versions of the document-term matrix: (i) A complete matrix, which contains all tokens; and (ii) A filtered matrix, where you remove all tokens that come from a list of so-called stopwords, such as: a, and , an, all, am, all, ...

Output. Print the two versions of the document-term matrix as defined above to cout. Use stream manipulators to format the table entries: dictionary words left-adjusted, fixed length, numbers right-adjusted, same width for each column. Additionally, print a legend indicating the file name for each document (column in the table).

Coding guidelines. Use a single file indexer.cpp, which must include your indexing functions, as well as a main() function to run your code.

Dictionary adventure Doci Doc3 dummies java programming Total 83 Dictionary adventure Doci Doc3 dummies java programming Total 83

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!