Question: Exercise 1: Explain how to solve the Overlap Alignment Problem. A hint: how can we change the alignment graph for global alignment graph by adding


Exercise 1: Explain how to solve the Overlap Alignment Problem. A hint: how can we change the alignment graph for global alignment graph by adding zero-weight "free ride" edges? What should these edges be, how do they compare to the free ride edges for local alignment? What is the resulting recurrence relation? The reads produced by genome sequencers fall into two main categories. We have worked with algorithms for short reads (of a few hundred nucleotides) like the ones produced by illumina's Sequencing by Synthesis technique. The other type of read is much longer, containing tens of thousands of nucleotides (or even longer than 100,000 nucleotides). However, long reads are very error-prone; the reads produced by a company like Pacific Biosciences have a -15% error rate. The benefit of longer reads is that we will need fewer reads to obtain the same coverage of the genome. However, with an error every few nucleotides, the current approach based on exact overlap of k-mers will completely fall apart. Instead, we might imagine using an alignment-based heuristic, since sequence alignments will easily find 85% similarity between two strings. In particular, we could have as a first step aligning every pair of reads; we then form an overlap graph of sorts in which nodes correspond to reads and an edge connects x to y if the resulting alignment is above some threshold score. The question is what type of alignment to use. We don't want global alignment, since only the ends of the reads will be similar. We don't want local alignment, since some substrings don't represent valid overlaps of reads. We want to have alignments of the form below that are "global-ish but only of the ends of the reads (where we don't know in advance how long the overlap will be). ATGCATGCCGG T-CC-GAAAC An overlap alignment of strings v= V1 ... Vn and w=w1 ... Wm is a global alignment of a suffix of v with a prefix of w. An optimal overlap alignment of strings v and w maximizes the global alignment score between an i-suffix of v and a j-prefix of wi.e., between Vi... Vn and W1 ... Wi) among all i and j. Overlap Alignment Problem: Construct a highest-scoring overlap alignment between two strings. Input: Two strings and a matrix score. Output: A highest-scoring overlap alignment between the two strings as defined by the scoring matrix score. Exercise 1: Explain how to solve the Overlap Alignment Problem. A hint: how can we change the alignment graph for global alignment graph by adding zero-weight "free ride" edges? What should these edges be, how do they compare to the free ride edges for local alignment? What is the resulting recurrence relation? The reads produced by genome sequencers fall into two main categories. We have worked with algorithms for short reads (of a few hundred nucleotides) like the ones produced by illumina's Sequencing by Synthesis technique. The other type of read is much longer, containing tens of thousands of nucleotides (or even longer than 100,000 nucleotides). However, long reads are very error-prone; the reads produced by a company like Pacific Biosciences have a -15% error rate. The benefit of longer reads is that we will need fewer reads to obtain the same coverage of the genome. However, with an error every few nucleotides, the current approach based on exact overlap of k-mers will completely fall apart. Instead, we might imagine using an alignment-based heuristic, since sequence alignments will easily find 85% similarity between two strings. In particular, we could have as a first step aligning every pair of reads; we then form an overlap graph of sorts in which nodes correspond to reads and an edge connects x to y if the resulting alignment is above some threshold score. The question is what type of alignment to use. We don't want global alignment, since only the ends of the reads will be similar. We don't want local alignment, since some substrings don't represent valid overlaps of reads. We want to have alignments of the form below that are "global-ish but only of the ends of the reads (where we don't know in advance how long the overlap will be). ATGCATGCCGG T-CC-GAAAC An overlap alignment of strings v= V1 ... Vn and w=w1 ... Wm is a global alignment of a suffix of v with a prefix of w. An optimal overlap alignment of strings v and w maximizes the global alignment score between an i-suffix of v and a j-prefix of wi.e., between Vi... Vn and W1 ... Wi) among all i and j. Overlap Alignment Problem: Construct a highest-scoring overlap alignment between two strings. Input: Two strings and a matrix score. Output: A highest-scoring overlap alignment between the two strings as defined by the scoring matrix score
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
