Question: Strings are a stark data structure that allows random access to the linear text stored within, the universal and portable representation of all information. Under

Strings are a stark data structure that allows random access to the linear text stored within, the universal and portable representation of all information. Under the hood, strings are character arrays, but the primitive nature of such arrays is hidden behind a better interface of public methods of the String class. Surprising algorithms have been developed over the years to quickly search and analyze both the string itself and the semantic information encoded in it. Even the seemingly simple task of searching for a pattern inside the given text can be optimized in countless ways, let alone various higher level tasks that are performed on these string instruments. Occasionally the string data structure can be augmented with additional data that speeds up operations that would be inefficient if given only the raw and unadorned text string itself. Even though this data in principle adds nothing new to the mix in that its structure is fully determined by the original string itself, having that information available can provide significant speedups as yet another form of space-time tradeoff. This lab showcases a simple but powerful suffix array method as a way to preprocess any text so that after the metaphorical cheque for the one-time payment of this preprocessing has cleared, all future searches of arbitrary patterns can be executed in time that grows only logarithmically with respect to the length of the text. This makes these pattern searches blazingly fast even when performed on the entire War and Peace! Computed from the given text string with n characters, its suffix array is an n-element array of integers that contains the position indices 0, ..., n-1 to the text. Since each position is stored in the suffix array exactly once, the suffix array is automatically some permutation of order n. The suffix array lists these n positions sorted in the lexicographic order (also called the "dictionary order") of the suffixes that start from each position. Note that the presentation on the linked Wikipedia page on suffix arrays uses one-based position indexing to the string, whereas here we naturally use Java's zero-based indexing to remain consistent to our principles. For example, the non-empty suffixes of the string "hello" are "hello", "ello", "llo", "lo" and "o", corresponding to positions ranging from 0 to 4 in the original string. Sorting these positions to the lexicographic order of the corresponding suffixes gives these positions in the order [1, 0, 2, 3, 4], which then becomes the suffix array of the original string. Since all n suffixes of text have a different length, the possibility of equality in their lexicographic comparison can never arise. This guarantees the uniqueness of the suffix array for the given text. For reasons of simplicity and convenience, suffix arrays are represented in this lab as instances of some subtype of List, despite the use of the standard technical term of "suffix array". I need to make a new class named P2J11, and there first the method public static List buildSuffixArray(String text) that builds and returns the suffix array for the given text. In this lab, this can be done with the naive algorithm, since the structure of our test case strings guarantees that the worst case scenario of this algorithm, a string whose all characters are the same, is never realized. The easiest way to implement this naive algorithm in Java should be to define yet another custom subtype of Comparator whose method compareTo lexicographically compares the two substrings that start from the positions that it receives as parameters. In this method, first define the local variable ArrayList result, and fill it up the brim with the integers from 0 to n-1. Then, just use the utility method Collections.sort to sort this result list with your custom Comparator. This discipline should allow your buildSuffixArray method to be reasonably fast even when building the suffix array for the entire War and Peace, as performed by the JUnit test class using the warandpeace.txt data file. Once the suffix array has been constructed for the given fixed text, it can be used to rapidly find all positions inside text where the given pattern occurs. These would be precisely the very same positions whose suffixes start with that pattern! Having already done the work of sorting these suffixes in lexicographic order in the previous preprocessing step allows us to use a slightly modified binary search performed to the suffix array to find the lexicographically first such suffix. Since all suffixes that start with pattern must necessarily follow each other as a bunch in the sorted array of such suffixes, looping through all these positions is a straightforward while-loop once the binary search has determined the first such position. To achieve all that, write the second method public static List find(String pattern, String text, List suffix) that creates and returns a list of positions of the original text that contain the given pattern, using the given suffix array to speed up this search. Note that this returned list of positions must be sorted in ascending order of the positions themselves, instead of the lexicographic order of the suffixes of text that start at these positions.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock

P2J11 Class with Suffix Array Methods This Java class implements the buildSuffixArray and find metho... View full answer

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...

QUIZ... Let D be a poset and let f : D D be a monotone function. (i) Give the definition of the least pre-fixed point, fix (f), of f. Show that fix (f) is a fixed point of f. [5 marks] (ii) Show that...

Googles ease of use and superior search results have propelled the search engine to its num- ber one status, ousting the early dominance of competitors such as WebCrawler and Infos- eek. Even later...

Which statements about try-with-resources are true? (Choose two.) A. Any resource used must implement Closeable. B. If more than one resource is used, then the order in which they are closed is the...

Two uniform rods AB and CD, of the same length l, are attached to gears as shown. Knowing that rod AB weighs 3 lb and that rod CD weighs 2 lb, determine the positions of equilibrium of the system and...

One of the larger species of tarantulas is the Grammostola mollicoma, whose common name is the Brazilian giant tawny red. A tarantula has two body parts. The anterior part of the body is covered...

Assuming the distribution being sampled is approximately normally distributed, use the small sample confidence interval for the mean to compute a (a) 95%lower bound confidence interval for ???? when...

Sam Shiatsu is a self-employed massage therapist and is the brother-in-law of a partner in the CPA firm where you work. During tax year 2008, Sam worked as an assistant supervisor and clinician for...

it was argued that the federalcommunication commissions rrepeal of net neutrality rules in 2 0 1 7 would have a negative impact on

Macon Machines Company began operations on November 1, 2024. The main operating goal of the company is to sell high end robots. Customers may pay using cash or if appropriate, credit is extended to...

What do you think? Is the Patent Office an agency for the formation and preservation of monopolies, to the disbenefit of our citizenry or a haven rewarding creativity and invention? How far should...

Consider the EMPLOYEE database schema shown in Figure 4.6. Write PHP code to create the tables of this schema. Figure 4.6: A specialization lattice with shared subclass ENGINEERING_MANAGER. EMPLOYEE...

When reading a resistor's color bands: The first two colored bands, pulled from Table 1, represent the first and second digits of the resistance, respectively The third colored band, pulled from...

8. Consider a cantilever viewing platform extending over the rim of a cliff as shown below. In an unsafe situation, the platform would pivot about the cliff's edge. Without fixing bolts, what is the...

Apple is delaying its plan to use neuralMatchLinks to an external site. technology to scan US iPhones for child pornography and child abuse. Currently, Apple uses end-to-end encryption to ensure...

You use Tableau for a project about spacecraft trajectories. You drag the Spacecraft field onto a shelf, selecting a specific craft. This enables you to monitor the trajectories that will be most...

is a perceptual bias involving the tendency to generalize from a group to a particular individual. a. A stereotype b. Self-serving bias c. Selective perception d. First impression error QUESTION 19...

In Exercises delete part of the domain so that the function that remains is one-to-one. Find the inverse function of the remaining function and give the domain of the inverse function. f(x) = 16x4 -3...

Implement a system for managing document retrieval. Your system should have the ability to insert (abstract references to) documents into the system, associate keywords with a given document, and to...

Show the Skip List that results from inserting the following values. Draw the Skip List after each insert. With each value, assume the depth of its corresponding node is as given in the list. value...

Explain why n n-1 =(n-i+1) =(n - i). i=1 i=0 i=1

Gina has $500,000 accumulated in her RRSP and intends to use the amount to purchase a 20-year annuity. She is investigating the size of quarterly payment she can expect to receive, depending on the...

Ichiro is checking potential outcomes for the growth of his RRSP. He plans to make contributions of $500 at the beginning of each month. What nominal rate of return must his RRSP earn for its future...

An ordinary annuity and an annuity due have the same present value, n, and i. Which annuity has the smaller payment? Give the reason for your answer.