Question: implement the following three components: AppInterface, ProcessingEngine and IndexStore. The IndexStore component is responsible with storing the DocumentMap and the TermInvertedIndex indexes and it exposes
implement the following three components: AppInterface, ProcessingEngine and IndexStore.
The IndexStore component is responsible with storing the DocumentMap and the TermInvertedIndex indexes and it exposes four services: putDocument, getDocument, updated Index and lookupIndex.
The DocumentMap index stores a mapping between the relative path of documents and a unique document number. The document number can be generated incrementally as documents get indexed by the FileRetrievalEngine, and this technique is used in order to minimize the amount of memory used by the TermInvertedIndex. The DocumentMap can be updated through the putDocument method, that receives as an input a document path and returns a unique document
number, and it can be read through the getDocument document method, that receives a document number as the input and returns the document path.
The TermInvertedIndex index stores a dictionary where the keys are wordsterms extracted from the documents and the values are a list of pairs of two numbers.
The first number in each pair is the document number and the second number is the number of times the wordterm appeared in the document. The TermInvertedIndex can be updated through the updateIndex method, that receives as an argument a document number and a list of pairs of terms and frequency of terms, and it can be queried through the lookupIndex method, that receives as an input a term and
returns as an output a list of pairs of document numbers and term frequencies.
The ProcessingEngine component is responsible with indexing the documents read from an input folder and with processing search commands. The ProcessingEngine will use the services provided by the IndexStore to store and access the document mapping and the term inverted index. The ProcessingEngine exposes two services: indexFolder and search.
The indexFolder method, receives as an argument an input folder path and builds an index from all of the documents found in the folder. To do this, it firstly needs to crawl the folder, create a list of document paths and call putDocument in order to receive a document number for each document. Then, it must extract all alphanumeric words azAZ that have a length greater than Any non alphanumeric character is considered a delimiter. While extracting the wordsterms the program must count the number of occurrencesfrequency of each unique wordterm in the document. For each document, after it extracted
all the unique alphanumeric words and after it computed the frequencies, the ProcessingEngine will then update the IndexStore by calling updateIndex.
The search method, receives as an argument a list of terms from an AND query and returns as a result the list of documents that contain all of the terms and the combined number of occurences per document for all terms. Your program needs to support queries that can have at least terms. To calculate the result for a search query, the ProcessingEngine needs to call for each input term lookupIndex.
Then it must combine the results for each term by implementing an intersection mechanism that has the following rules: if a document number exists in all lookup results, include it in the final result and the frequency of the final result will be calculated as the sum of frequencies between all results for each document number. Then, the ProcessingEngine needs to sort the final list of pairs of
document numbers and frequencies in descending order by frequency and keep only the top results. Finally, the ProcessingEngine must call getDocument in order to get the document paths for the final results, that will be returned together with the frequencies.
The AppInterface component is responsible with implementing a command line interface that the user can use to interact with the File Retrieval Engine. The command line interface must support interpreting indexing and search commands submitted by the user, and is responsible with forwarding the commands to the ProcessingEngine and with printing the results of the commands on the screen.
The File Retrieval Engine must support the following commands:
quit: this command closes the application by gracefully.
index : this command needs to tell the File Retrieval Engine to crawl and find all the documents in the given folder path, and must build an index from those documents. Sequences of alphanumeric characters azAZ that are separated by any other nunalphanumeric characters and are larger than characters are considered as terms that need to be indexed.
search : user inputs the following query: cats AND dogs,the File Retrieval Engine must return all the documents that contain cats and dogs, must sort the returned documents by the total number of occurrences of both cats and dogs in each document, must return top documents.
Give me code in java with Screenshot of output.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
