3.8 Exercises 1. Given three DNA sequences S1, S2, and S3 of total length n. aagatgt....
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
3.8 Exercises 1. Given three DNA sequences S1, S2, and S3 of total length n. aagatgt. What is the (b) Describe an efficient algorithm that computes the length of the longest common substring. What is the time complexity? (a) Suppose Si acgatca, S2 = gattact, S3 longest common substring of S1, S2, and S3? PRO (c) It is possible that the longest common substring for S₁, S2, and S3 is not unique. Can you describe an efficient algorithm to report the number of possible longest common substrings of S₁, S2, and S3? What is the time complexity? 2. Please give the generalized suffix tree for S₁ = ACGTS and S₂ TGCA#. 3. Given a DNA sequence S and a pattern P, can you describe an O(|P|² + |S) time algorithm to find all occurrences of P in S with hamming distance <1? 4. Consider the string S = ACGT ACGTS. (a) What is the suffix array for S? (b) Report the values of (1) LCP(k, k + 1) for k= 1, 2, ...,8 and (2) LCP (k, k+4) for k = 1, 2, 3, 4. (For 3., Hamming distance of <= 1 means that we should look for a substring M in S such that IM| = |P| and there is at most one mismatch between M and P.) Problem 5 (10 pts) Programming project: Write a program to compute the suffix array for a given input string (you may use any programming language). Your program should read in a FASTA file (like HW2). You can assume that the file just contains a single DNA sequence, e.g. > seql ACTGGGAAATCGAAGACCCGG Remember to add a $' to the end of the string. The output should just be the suffix array table, e.g. SA [1] = x SA [2] = y etc. Bonus question (1 pt): Implement a binary search based pattern matching algorithm in the above program that searches the suffix array for a given pattern. 3.8 Exercises 1. Given three DNA sequences S1, S2, and S3 of total length n. aagatgt. What is the (b) Describe an efficient algorithm that computes the length of the longest common substring. What is the time complexity? (a) Suppose Si acgatca, S2 = gattact, S3 longest common substring of S1, S2, and S3? PRO (c) It is possible that the longest common substring for S₁, S2, and S3 is not unique. Can you describe an efficient algorithm to report the number of possible longest common substrings of S₁, S2, and S3? What is the time complexity? 2. Please give the generalized suffix tree for S₁ = ACGTS and S₂ TGCA#. 3. Given a DNA sequence S and a pattern P, can you describe an O(|P|² + |S) time algorithm to find all occurrences of P in S with hamming distance <1? 4. Consider the string S = ACGT ACGTS. (a) What is the suffix array for S? (b) Report the values of (1) LCP(k, k + 1) for k= 1, 2, ...,8 and (2) LCP (k, k+4) for k = 1, 2, 3, 4. (For 3., Hamming distance of <= 1 means that we should look for a substring M in S such that IM| = |P| and there is at most one mismatch between M and P.) Problem 5 (10 pts) Programming project: Write a program to compute the suffix array for a given input string (you may use any programming language). Your program should read in a FASTA file (like HW2). You can assume that the file just contains a single DNA sequence, e.g. > seql ACTGGGAAATCGAAGACCCGG Remember to add a $' to the end of the string. The output should just be the suffix array table, e.g. SA [1] = x SA [2] = y etc. Bonus question (1 pt): Implement a binary search based pattern matching algorithm in the above program that searches the suffix array for a given pattern.
Expert Answer:
Answer rating: 100% (QA)
Lets address each of the exercises a Given S1 acgatca S2 gattact and S3 aagatgt we need to find the longest common substring among them In this case t... View the full answer
Related Book For
Accounting for Decision Making and Control
ISBN: 978-1259564550
9th edition
Authors: Jerold Zimmerman
Posted Date:
Students also viewed these programming questions
-
Figure 6.6 shows the derivative g'. If g(0) = 0, graph g. Give (x, y)-coordinates of all local maxima and minima. -g'(x)- 2 6. -1 Figure 6.6 5
-
Toledo Bottling Company bottles a special beverage with proprietary ingredients which enables students perform exceedingly well in any exam administered at UT. When everything in the factory was...
-
If (x, y) = 100 ( y + 1) represents the population density of a planar region on Earth, where x and y are measured in miles, find the number of people in the region bounded by the curves x = y 2 and...
-
Consider a nonideal gas obeying a modified van der Waals equation of state \[\left(P+a / v^{n} ight)(v-b)=R T \quad(n>1) .\] Examine how the critical constants \(P_{c}, v_{\mathrm{c}}\), and...
-
Ten samples of 15 parts each were taken from an ongoing process to establish a p-chart for control. The samples and the number of defectives in each are shown in the following table: a. Develop a p-...
-
Overview You recently created a direct mail campaign for the Kitchener Soccer Club to attract more signups for their programs. Now they are looking to run a retention campaign via email marketing to...
-
Which series has the highest beta. BraveNewCoin Liquid Index for Bitcoin 1D BNC Trading Brave Ne Yellow Green Blue Orange
-
You are doing a great job in the leadership program and have made some valuable contributions up to this point. Your organization has not had the resources to update computers and data processing...
-
Please have an elevator pitch for the case below and answer the following questions: (Please try to be concise and write a short answer for each question.) 1) Does Biolite illustrate the principles...
-
For the data below, which represents a sample with n = 9, answer the questions. Round to 4 decimal places where possible. x 14.3 14.6 14.5 13.7 11.7 8.5 28.9 4.4 16.3 Find the mean: Find the median:...
-
What a response to a peer that states In financial distress, a company or individual cannot generate enough income or revenue to meet its financial obligations. It can be caused by various factors,...
-
Company Bongo has $1,000,000 in extra cash beyond what it needs for continuing operations. Bongo decides to pay the excess cash out to stockholders as a dividend. Company Tango has $1,000,000 in...
-
Illustrate the performance of the heap-sort algorithm on the following input sequence: make the heap tree and then sort it. ( 14, 11, 8, 4, 15, 12, 10, 1, 3, 7, 27, 17, 19, 13)
-
A coin is placed 23 cm in front of a two-lens system. Lens 1 (nearer the coin) has focal length f1 = +11 cm, lens 2 has f2 = +13.9 cm, and the lens separation is d = 31 cm. For the image produced by...
-
In muscle tissue, the ratio of phosphorylase a to phosphorylase b determines the rate of conversion of glycogen to glucose 1phosphate. Classify how each event affects the rate of glycogen breakdown...
-
The Allington Screen Plant of Allington Windows manufactures new and replacement screens. The plant produces more than 500 different screen sizes and offers four different aluminum frame colors....
-
Berkman Financial is a regional bank that offers a variety of financial services to both retail and commercial customers. Berkman uses the step-down allocation method to allocate four service...
-
Stahl produces and sells a single product and faces an inelastic demand curve, meaning it can sell as many units as it wants without affecting the selling price. Stahl has a cost structure consisting...
-
Let \(X_{1}, \ldots, X_{n}\) be a set of independent and identically distributed random variables following the distribution \(F\). Prove that for a fixed value of \(t \in \mathbb{R}\), the empirical...
-
Let \(\left\{X_{n}ight\}_{n=1}^{\infty}\) be a sequence of random variables such that \[\lim _{n ightarrow \infty} E\left(\left|X_{n}-cight|ight)=0\] for some \(c \in \mathbb{R}\). Prove that \(X_{n}...
-
Let \(U\) be a UNIFORm \([0,1]\) random variable and let \(\left\{X_{n}ight\}_{n=1}^{\infty}\) be a sequence of random variables such that \[X_{n}=\delta\left\{U ;\left(0,...
Study smarter with the SolutionInn App