Question: B. Write a prograam called EditDistance.java or editdistance.py that implements the edit distance algorithm discussed in the textbook. Your program takes as input two strings
B. Write a prograam called EditDistance.java or editdistance.py that implements the edit distance algorithm discussed in the textbook. Your program takes as input two strings given as command line parameters and then calculates and prints the minimum edit distance between them. Please read carefully section 3.7, which explains in detail the steps to developing an algorithm for solving this problem. In that section, you will find the algorithm in pseudocode. I suggest testing your program on the two strings in the book's example.
A run of your program should go like this:
C:\>python editdistance.py ALGORITHM ALTRUISTIC Edit distance between ALGORITHM and ALTRUISTIC IS 6
or
C:\>java EditDistance ALGORITHM ALTRUISTIC Edit distance between ALGORITHM and ALTRUISTIC IS 6
This is the textbook section and pseudocode:








3.7 Edit Distance The edit distance between two strings is the minimum number of letter inser- tions, letter deletions, and letter substitutions required to transform one string into the other. For example, the edit distance between FOOD and MONEY is at most four: FOOD MOOD MOND MONED MONEY This distance function was independently proposed by Vladimir Levenshtein in 1965 (working on coding theory), Taras Vintsyuk in 1968 (working on speech recognition), and Stanislaw Ulam in 1972 (working with biological sequences). For this reason, edit distance is sometimes called Levenshtein distance or Ulam distance (but strangely, never Vintsyuk distance). We can visualize this editing process by aligning the strings one above the other, with a gap in the first word for each insertion and a gap in the second word for each deletion. Columns with two different characters correspond to substitutions. In this representation, the number of editing steps is just the number of columns that do not contain the same character twice. FOO D It's fairly obvious that we can't transform FOOD into MONEY in three steps, so the edit distance between FOOD and MONEY is exactly four. Unfortunately, it's not so easy in general to tell when a sequence of edits is as short as possible. For example, the following alignment shows that the distance between the strings ALGORITHM and ALTRUISTIC is at most 6. Is that the best we can do? 111 3. DYNAMIC PROGRAMMING ALGORI AL TRU I STIC Recursive Structure To develop a dynamic programming algorithm to compute edit distance, we first need to formulate the problem recursively. Our alignment representation for edit sequences has a crucial optimal substructure property. Suppose we have the gap representation for the shortest edit sequence for two strings. If we remove the last column, the remaining columns must represent the shortest edit sequence for the remaining prefixes. We can easily prove this observation by contradiction: If the prefixes had a shorter edit sequence, gluing the last column back on would gives us a shorter edit sequence for the original strings. So once we figure out what should happen in the last column, the Recursion Fairy can figure out the rest of the optimal gap representation. Said differently, the alignment we are looking for represents a sequence of editing operations, ordered (for no particular reason) from right to left. Solving the edit distance problem requires making a sequence of decisions, one for each column in the output alignment. In the middle of this sequence of decisions, we have already aligned a suffix of one string with a suffix of the other. ALGOR ALTRU |I5490 I THM ISTIC Because the cost of an alignment is just the number of mismatched columns, our remaining decisions don't depend on the editing operations we've already chosen; they only depend on the prefixes we haven't aligned yet. ALGOR ALTRU ALGOR ALTRU Thus, for any two input strings A[1.. m] and B[1.. n], we can formulate the edit distance problem recursively as follows: For any indices i and j, let Edit(i,j) denote the edit distance between the prefixes A[1..i] and B[1..j]. We need to compute Edit(m,n). Recurrence When i and j are both positive, there are exactly three possibilities for the last column in the optimal alignment of A[1..i] and B[1..j]: Insertion: The last entry in the top row is empty. In this case, the edit distance is equal to Edit(i, j-1)+1. The +1 is the cost of the final insertion, 112 3.7. Edit Distance and the recursive expression gives the minimum cost for the remaining alignment. ALGOR ALTR U Deletion: The last entry in the bottom row is empty. In this case, the edit distance is equal to Edit(i-1,j)+1. The +1 is the cost of the final deletion, and the recursive expression gives the minimum cost for the remaining alignment. R ALGO ALTRU R ALGO ALTRU Substitution: Both rows have characters in the last column. If these two characters are different, then the edit distance is equal to Edit(i-1,j-1)+1. If these two characters are equal, the substitution is free, so the edit distance is Edit(i 1,j-1). ALGOR ALTR U ALGOR ALT R This generic case analysis breaks down if either i = 0 or j = 0, but those boundary cases are easy to handle directly. Transforming the empty string into a string of length j requires j insertions, so Edit(0, j)=j. Transforming a string of length i into the empty string requires i deletions, so Edit(i,0)= i. As a sanity check, both of these base cases correctly indicate that the edit distance between the empty string and the empty string is zero! We conclude that the Edit function satisfies the following recurrence: if j = 0 if i=0 Edit(i, j)= min Edit(i, j - 1)+1 Edit(i 1,j)+1 Edit(i 1,j-1)+ [A[i] + B[j]]) otherwise Dynamic Programming Now that we have a recurrence, we can transform it into a dynamic programming algorithm following our usual mechanical recipe. Subproblems: Each recursive subproblem is identified by two indices osism and 0
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
