Question: 1. (45 pts) Recal that the string alignment problem takes as input two strings and y composed of symbols xi,yj E , for a fixed

1. (45 pts) Recal that the string alignment problem takes as input two strings and y composed of symbols xi,yj E , for a fixed symbol set , and returns a minimal-cost set of edit operations for transforming the string z into string y Let a contain n, symbols, let y contain ny symbols, and let the set of edit operations be those defined in the lecture notes (substitution, insertion, deletion, and transposition) Let the cost of indel be 1, the cost of swap be 10 (plus the cost of the two sub ops), and the cost of sub be 10, except when j, which is a "no-op" and has cost In this problem, we will implement and apply three functions. (i) alignStrings(x,y) takes as input two ASCII strings r and y, and runs a dynamic programming algorithm to return the cost matrix S, which contains the optimal costs for all the subproblems for aligning these two strings. alignStrings(x,y) // x,y are ASCII strings S table of length nx by ny // for memoizing the subproblem costs initialize S for i = 1 to nx // fill in the basecases for j = 1 tony S[i.j] -cost (i.j) // optimal cost for x [O..] and y[O..j] b) return S (ii) extractAlignment (S,x,y) takes as input an optimal cost matrix S, strings r, y and returns a vector a that represents an optimal sequence of edit operations to convert r into y. This optimal sequence is recovered by finding a path on the implicit DAG of decisions made by alignStrings to obtain the value SIn, ny], starting from S0,0 extractAlignment (S,x,y) I/S is an optimal cost matrix from alignStrings initialize a II empty vector of edit operations // initialize the search for a path to So,0] while i >0 or j > 0 a[] [i.j] determineOptima10pCS,i.jx,y) - updateIndices (S,i,j,a) // what was an optimal // move to next position choice? return a When storing the sequence of edit operations in a, use a special symbol to denote ii) commonSubstrings(x,L,a) which takes as input the ASCII string r, an integer 1 S Ln, and an optimal sequence a of edits to z, which would transform into y. This function returns each of the substrings of length at least L in r that aligns exactly, via a run of no-ops, to a substring in y (a) From scratch, implement the functions alignStrings, extractAlignment, and commonSubstrings. You may not use any library functions that make their imple- mentation trivial. Within your implementation of extractAlignment, ties must be broken uniformly at random. Submit (i) a paragraph for each function that explains how you implemented it describe how it works and how it uses its data structures), and (ii) your code implementation, with code comments. Hint: test your code by reproducing the APE / STEP and the EXPONENTIAL POLYNOMIAL examples in the lecture notes (to do this exactly, you'll need to use unit costs instead of the ones given above 1. (45 pts) Recal that the string alignment problem takes as input two strings and y composed of symbols xi,yj E , for a fixed symbol set , and returns a minimal-cost set of edit operations for transforming the string z into string y Let a contain n, symbols, let y contain ny symbols, and let the set of edit operations be those defined in the lecture notes (substitution, insertion, deletion, and transposition) Let the cost of indel be 1, the cost of swap be 10 (plus the cost of the two sub ops), and the cost of sub be 10, except when j, which is a "no-op" and has cost In this problem, we will implement and apply three functions. (i) alignStrings(x,y) takes as input two ASCII strings r and y, and runs a dynamic programming algorithm to return the cost matrix S, which contains the optimal costs for all the subproblems for aligning these two strings. alignStrings(x,y) // x,y are ASCII strings S table of length nx by ny // for memoizing the subproblem costs initialize S for i = 1 to nx // fill in the basecases for j = 1 tony S[i.j] -cost (i.j) // optimal cost for x [O..] and y[O..j] b) return S (ii) extractAlignment (S,x,y) takes as input an optimal cost matrix S, strings r, y and returns a vector a that represents an optimal sequence of edit operations to convert r into y. This optimal sequence is recovered by finding a path on the implicit DAG of decisions made by alignStrings to obtain the value SIn, ny], starting from S0,0 extractAlignment (S,x,y) I/S is an optimal cost matrix from alignStrings initialize a II empty vector of edit operations // initialize the search for a path to So,0] while i >0 or j > 0 a[] [i.j] determineOptima10pCS,i.jx,y) - updateIndices (S,i,j,a) // what was an optimal // move to next position choice? return a When storing the sequence of edit operations in a, use a special symbol to denote ii) commonSubstrings(x,L,a) which takes as input the ASCII string r, an integer 1 S Ln, and an optimal sequence a of edits to z, which would transform into y. This function returns each of the substrings of length at least L in r that aligns exactly, via a run of no-ops, to a substring in y (a) From scratch, implement the functions alignStrings, extractAlignment, and commonSubstrings. You may not use any library functions that make their imple- mentation trivial. Within your implementation of extractAlignment, ties must be broken uniformly at random. Submit (i) a paragraph for each function that explains how you implemented it describe how it works and how it uses its data structures), and (ii) your code implementation, with code comments. Hint: test your code by reproducing the APE / STEP and the EXPONENTIAL POLYNOMIAL examples in the lecture notes (to do this exactly, you'll need to use unit costs instead of the ones given above
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
