Question: Python 3 Only Problem 2. (Recovering the Alignment) Now that we know how to compute the edit distance between two strings, we next want to

Python 3 Only

Problem 2. (Recovering the Alignment) Now that we know how to compute the edit distance between two strings, we next want to recover the optimal alignment itself. The key idea is to retrace the steps of the dynamic programming algorithm backwards, re-discovering the path of choices (highlighted in red in the table above) from opt[0][0] to opt[M][N]. To determine the choice that led to opt[i][j], we consider the three possibilities: 1. The optimal alignment matches x[i] up with a gap. In this case, we must have opt[i][j] = opt[i + 1][j] + 2. 2. The optimal alignment matches y[j] up with a gap. In this case, we must have opt[i][j] = opt[i][j + 1] + 2. 3. The optimal alignment matches x[i] up with y[j]. In this case, we must have opt[i][j] = opt[i + 1][j + 1] if x[i] equals y[j], or opt[i][j] = opt[i + 1][j + 1] + 1 otherwise.

Write a program alignment.py that reads from standard input, the output produced by edit_distance.py, ie, input strings x and y, and the opt matrix. The program should then recover an optimal alignment using the procedure described above, and write to standard output the edit distance between x and y and the alignment itself, using the following format:

The rst line should contain the edit distance, preceded by the text Edit distance = .

Each subsequent line should contain a character from the rst string, followed by the paired character from the second string, followed by the associated penalty. Use the character - to indicate a gap in either string. $ python3 edit_distance.py < data/example10.txt | python3 alignment.py

Edit distance = 7

A T 1

A A 0

C - 2

A A 0

G G 0

T G 1

T T 0

A - 2

C C 0

C A 1 Data The data directory contains short test data les and actual genomic data les. Be sure to test your programs thoroughly using the short test les and the longer actual data les. Here are the optimal edit distances of several of the supplied les: ecoli2500.txt 118

ecoli5000.txt 160

fli8.txt 6 fli9.txt 4

fli10.txt 2

ftsa1272.txt 758

gene57.txt 8

stx1230.txt 521

stx19.txt 10

stx26.txt 17

stx27.txt 19

and here are the test cases with unique optimal alignments:

$ python3 edit_distance.py < data/endgaps7.txt | python3 alignment.py

Edit distance = 4

a - 2

t t 0

a a 0

t t 0

t t 0

a a 0

t t 0

- a 2

$ python3 edit_distance.py < data/fli10.txt | python3 alignment.py

Edit distance = 2

T T 0

G G 0

G G 0

C T 1

G G 0

G G 0

A T 1

A A 0

C C 0

T T 0

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!