Question: CS4390 Introduction to Bioinformatics Assignment 2 (100 points) Due Date 11:59 PM of 03/05/2017 (no extensions this time, start early) 1.Consider the sequences v =

CS4390 Introduction to Bioinformatics

Assignment 2 (100 points)

Due Date 11:59 PM of 03/05/2017 (no extensions this time, start early)

1.Consider the sequences v = TACGGGTAT and w = GGACGTACG. Assume that the match score is +1 and that the mismatch and indel penalties are 1. The purpose of this question is to help you understand the dynamic programming algorithm. Note that a similar question will be asked in midterm. This question is a good practice for the midterm. (Code submission is not required for this question). (30 points)

2.Fill out the dynamic programming matrix for a global alignment between v and w. Draw arrows in the cells to store the backtrack information. What is the score of the optimal global alignment and what alignment does this score correspond to?

Fill out the dynamic programming table for a local alignment between v and w. Draw arrows in the cells to store the backtrack information. What is the score of the optimal local alignment in this case and what alignment achieves this score?

Implement a program that will generate all possible paths from source (0,0) to sink (n,m) in a n x m rectangular grid. Test your program by using small values of n, and m. (20 points)

3. Implement a brute force approach to find the optimal global alignment between two DNA sequences. Your program should output the alignment and score. Consider match score = 1, mismatch score = -1, and indel penalty = -2. Test your algorithm on short sequences. (20 points)

4.(30 points) A virus infects a bacterium, and modifies a replication process in the bacterium by inserting

at every A, a polyA of length 1 to 5.

at every C, a polyC of length 1 to 10.

at every G, a polyG of arbitrary length >= 1.

at every T, a polyT of arbitrary length >= 1.

No gaps or other insertions are allowed in the virally modified DNA. For example, the sequence AAATAAAGGGGCCCCCTTTTTTTCC is an infected version of ATAGCTC.

Given sequences v and w, implement an algorithm that will determine if v could be an infected version of w. You can assume that length of v is greater than length of w.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!