Question: LCS stands for longest common subsequence. Vinder is a new smartphone app designed to make it easier for everyone to eat their vegeta- bles. Vinder

 LCS stands for longest common subsequence. Vinder is a new smartphone

LCS stands for longest common subsequence.

Vinder is a new smartphone app designed to make it easier for everyone to eat their vegeta- bles. Vinder compares your genetic information with a large database of vegetable genetic sequences to match you with the right vegetable based on sequence similarity. Specifically it computes the longest common subsequence between your DNA sequence and the DNA sequences of thousands of vegetables. An n-length DNA sequence, A, is an ordered collection of elements A, C, G,T], or A E A, C, G, T. A subsequence is a sequence formed by deleting elements from another sequence, thus preserving the original order. A k-length prefix of sequence A is the first k elements of A. For example, A-[A, C,C,G,G,A, A, T, C is a sequence. [A,C,A, T] and C are subsequences of A but T, A is not. [A, C, C, G is a 4-prefix of A and is the 0-prefix. Our goal is to find the longest common subsequence (LCS) of A, the human DNA sequence, and sequences B E D where D is the database of vegetable DNA sequences and k D This solution can be found efficiently using dynamic programming. At a high-level, the algorithm is: 1. For each seqence B* E D 2. Initialize the length of the LCS to 0 3. Let A, denote the ith element of A. Consider each pair of A and B; either A, -B If A -B, then the length of the LCS of the i-prefix of A and j-prefix of Bk is one more than the LCS of the i -prefix of A and j--prefix of B If A B, then we cannot extend the LCS. Instead, we conclude that the LCS up to (A, ) is the maximum of the LCS up to (A-, B,) or (A, Bt1) The two aforementioned cases sum up the possibilities at (A, B) We can use these to recursively define the LCS in terms of smaller problem solutions. Your goa is to describe the algorithm to do this. 1. Describe an algorithm to find the length of the LCS that does not use dynamic pro- udocode, simply describe the algorithm in enough detail so that we understand it. What is the runti? Why is it correct? You may keep your answers informal. Hint: a brute-force or recursive solution should have gramming. You do not have to write pseudo erponential runtime 2. Write out the recurrence relation used in the algorithm defined above. That is, write out how the LCS length can be expressed in terms of smaller instances of the problem. 3. Think about how to convert this into a dynamic program. What size table would you need for the LCS problem? 4. Interpret each table entry. That is, give your explanation for what each entry in the dynamic programming table represents. We discussed how to interpret the entries of the dynamic programming table for the similar problem of edit distance. Vinder is a new smartphone app designed to make it easier for everyone to eat their vegeta- bles. Vinder compares your genetic information with a large database of vegetable genetic sequences to match you with the right vegetable based on sequence similarity. Specifically it computes the longest common subsequence between your DNA sequence and the DNA sequences of thousands of vegetables. An n-length DNA sequence, A, is an ordered collection of elements A, C, G,T], or A E A, C, G, T. A subsequence is a sequence formed by deleting elements from another sequence, thus preserving the original order. A k-length prefix of sequence A is the first k elements of A. For example, A-[A, C,C,G,G,A, A, T, C is a sequence. [A,C,A, T] and C are subsequences of A but T, A is not. [A, C, C, G is a 4-prefix of A and is the 0-prefix. Our goal is to find the longest common subsequence (LCS) of A, the human DNA sequence, and sequences B E D where D is the database of vegetable DNA sequences and k D This solution can be found efficiently using dynamic programming. At a high-level, the algorithm is: 1. For each seqence B* E D 2. Initialize the length of the LCS to 0 3. Let A, denote the ith element of A. Consider each pair of A and B; either A, -B If A -B, then the length of the LCS of the i-prefix of A and j-prefix of Bk is one more than the LCS of the i -prefix of A and j--prefix of B If A B, then we cannot extend the LCS. Instead, we conclude that the LCS up to (A, ) is the maximum of the LCS up to (A-, B,) or (A, Bt1) The two aforementioned cases sum up the possibilities at (A, B) We can use these to recursively define the LCS in terms of smaller problem solutions. Your goa is to describe the algorithm to do this. 1. Describe an algorithm to find the length of the LCS that does not use dynamic pro- udocode, simply describe the algorithm in enough detail so that we understand it. What is the runti? Why is it correct? You may keep your answers informal. Hint: a brute-force or recursive solution should have gramming. You do not have to write pseudo erponential runtime 2. Write out the recurrence relation used in the algorithm defined above. That is, write out how the LCS length can be expressed in terms of smaller instances of the problem. 3. Think about how to convert this into a dynamic program. What size table would you need for the LCS problem? 4. Interpret each table entry. That is, give your explanation for what each entry in the dynamic programming table represents. We discussed how to interpret the entries of the dynamic programming table for the similar problem of edit distance

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!