Question: Recall the Edit Distance (Sequence Alignment) problem: given two strings over the same alphabet and mismatch and gap penalties, find an alignment of minimal cost.

Recall the Edit Distance (Sequence Alignment) problem: given two strings over the same alphabet and mismatch and gap penalties, find an alignment of minimal cost. One of the most common uses of the minimum edit distance algorithm is in computational biology. DNA sequences are composed of four amino-acids, denoted by the letters A, C, T, G. Mutation over the course of evolution changes the sequences by deleting, inserting, or substituting amino-acids. The smaller the edit distance between some two sequences, the smaller the evolutionary distance between them. Thus, biological sequences are often aligned to minimize the edit distance between them.

1. Suppose the costs of mismatches and gaps are not the same for all the letters in the DNA sequences (since some mutations are more common than others). The following table states the cost of each operation for each letter: Recall the Edit Distance (Sequence Alignment) problem: given two strings over the

In the substitution table, the entry in row A, column C, is the penalty AC for A to C mismatch (but not vise versa) and the entry in row A, column -, is the cost of aligning A atop a gap. Find the minimum edit distance AND the optimal alignment between the following sequences (show the matrix of your calculations):

Sequence 1 (top): G A T T A C A

Sequence 2 (bottom):A T T A A C

Use dynamic programming.

Guidelines for presenting an algorithm as a solution to a problem:

1. Brief, informal, intuitive description. A line or two of English.

2. Detailed description. Mostly English, but may include well-documented pseudocode

and diagrams.

3. Proof of correctness - this means that for all inputs for which this program terminates,

the algorithm gives the correct corresponding output. Sometimes this can be included

in the detailed description if worded appropriately. Usually best to add separately.

Mostly English, math and diagrams.

4. Proof of termination - this means that the algorithm will terminate on all inputs.

Usually trivial, sometimes not. Make sure one way or the other.

5. Analysis of time and space complexity. Upper bound on the worst case is what we

seek in general.

1121 2310 oo A-0 2 2 12 ACTG 1121 2310 oo A-0 2 2 12 ACTG

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!