Write a function, sometimes_greedy_assemble, that takes as input a list of read strings and uses a...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Write a function, sometimes_greedy_assemble, that takes as input a list of read strings and uses a modified version of the greedy fragment assembly algorithm to assemble them into a single superstring. We will consider a modification to the greedy algorithm described for fragment assembly (Page 9 of the Sequence Assembly - Graphs and fragment assembly) in which the algorithm does not always add the next (largest overlap) compatible edge from the queue. Instead, after popping the next edge off of the queue and checking that it is compatible with the current graph, we will choose to add it to the graph with probability p, and if not, we will instead add it to a list, H, of "held aside" edges that will be considered later, and then continue with the algorithm. The modified pseudocode for the algorithm is: •Let G be a graph with fragments as vertices and no edges • Create a queue, Q, of overlap edges (not currently in G), with edges in order of increasing weight (decreasing overlap length) • Initialize H to be an empty list • While G is disconnected Pop the next possible edge e = (u, v) off of Q ■ If outdegree(u) = 0 and indegree(v) = 0 and e does not create a cycle o Let x be a random number drawn uniformly from [0, 1) o If x < p ■ o add e to G o move the edges in H back to Qin sorted order o Else o add e to H ■ If Q is empty o move the edges in H back to Qin sorted order A potential advantage of this randomized algorithm is that, in cases where the deterministic greedy algorithm fails to find the shortest superstring, there is a non-zero probability that the randomized algorithm will find a shorter superstring than the deterministic algorithm. And we can run the randomized algorithm many times to increase these chances. To keep things simple for this homework we will allow overlaps of any length (including zero). In practice, we would typically require some minimum overlap length. For simplicity, we will also assume that: 1. we are assembling a single-stranded sequence and 2. that no read is a substring of any other read. Important implementation details Random number generation Random number generation should occur only at the line specified in the pseudocode, and you should use the random.random function for this purpose. Tie-breaking criteria For the purpose of making this algorithm deterministic, we must establish tiebreaking criteria for edges in the overlap graph that have the same weight. For two edges with the same weight, we will first choose the edge whose source vertex read is first in lexicographical order. If the source vertices are identical, then we choose the edge whose target vertex read is first in lexicographical order. For example, if e1 = ATCGGAGGAT and e2 = ATCGGA→GGAA, we will attempt to use edge e2 first because GGAA <GGAT according to lexicographical order. You may find useful the fact that comparison operators for sequences in Python (e.g., tuples) use lexicographical ordering. For example, # read sets for testing tiny_reads = ["ATAG", "CATA", "TAAT"] single_base_reads = ["C", "A", "T", "G"] medium_reads = # utility functions for testing import random ["ATGCT", "CTAT", "CCTATA", "CCC", "CTCC", "AAG"] def read_strings_from_file(filename): return [line.rstrip () for line in open (filename)] def test sometimes_greedy_assemble_with_files (reads_filename, superstring_filename, p = 1.0): reads = read_strings_from_file(reads_filename) superstring assert sometimes_greedy_assemble (reads, p) = read_strings_from_file(superstring_filename) [0] # TEST: returns a string (8 POINTS) assembly = sometimes_greedy_assemble(tiny_reads) # TEST: returns a superstring (6 POINTS) def check_is_superstring (assembly, reads): for read in reads: assembly = assert isinstance (assembly, str), "Return value of sometimes_greedy_assemble is not a str" print("SUCCESS: returns a string passed!") == = assert read in assembly, f" read '{read}' is not contained in assembly" sometimes_greedy_assemble (tiny_reads) check_is_superstring(assembly, tiny_reads) print("SUCCESS: returns a superstring passed!") superstring # TEST: tiny_deterministic (4 POINTS) assembly sometimes_greedy_assemble(tiny_reads, 1.0) assert assembly == 'CATAGTAAT' print("SUCCESS: tiny_deterministic passed") Write a function, sometimes_greedy_assemble, that takes as input a list of read strings and uses a modified version of the greedy fragment assembly algorithm to assemble them into a single superstring. We will consider a modification to the greedy algorithm described for fragment assembly (Page 9 of the Sequence Assembly - Graphs and fragment assembly) in which the algorithm does not always add the next (largest overlap) compatible edge from the queue. Instead, after popping the next edge off of the queue and checking that it is compatible with the current graph, we will choose to add it to the graph with probability p, and if not, we will instead add it to a list, H, of "held aside" edges that will be considered later, and then continue with the algorithm. The modified pseudocode for the algorithm is: •Let G be a graph with fragments as vertices and no edges • Create a queue, Q, of overlap edges (not currently in G), with edges in order of increasing weight (decreasing overlap length) • Initialize H to be an empty list • While G is disconnected Pop the next possible edge e = (u, v) off of Q ■ If outdegree(u) = 0 and indegree(v) = 0 and e does not create a cycle o Let x be a random number drawn uniformly from [0, 1) o If x < p ■ o add e to G o move the edges in H back to Qin sorted order o Else o add e to H ■ If Q is empty o move the edges in H back to Qin sorted order A potential advantage of this randomized algorithm is that, in cases where the deterministic greedy algorithm fails to find the shortest superstring, there is a non-zero probability that the randomized algorithm will find a shorter superstring than the deterministic algorithm. And we can run the randomized algorithm many times to increase these chances. To keep things simple for this homework we will allow overlaps of any length (including zero). In practice, we would typically require some minimum overlap length. For simplicity, we will also assume that: 1. we are assembling a single-stranded sequence and 2. that no read is a substring of any other read. Important implementation details Random number generation Random number generation should occur only at the line specified in the pseudocode, and you should use the random.random function for this purpose. Tie-breaking criteria For the purpose of making this algorithm deterministic, we must establish tiebreaking criteria for edges in the overlap graph that have the same weight. For two edges with the same weight, we will first choose the edge whose source vertex read is first in lexicographical order. If the source vertices are identical, then we choose the edge whose target vertex read is first in lexicographical order. For example, if e1 = ATCGGAGGAT and e2 = ATCGGA→GGAA, we will attempt to use edge e2 first because GGAA <GGAT according to lexicographical order. You may find useful the fact that comparison operators for sequences in Python (e.g., tuples) use lexicographical ordering. For example, # read sets for testing tiny_reads = ["ATAG", "CATA", "TAAT"] single_base_reads = ["C", "A", "T", "G"] medium_reads = # utility functions for testing import random ["ATGCT", "CTAT", "CCTATA", "CCC", "CTCC", "AAG"] def read_strings_from_file(filename): return [line.rstrip () for line in open (filename)] def test sometimes_greedy_assemble_with_files (reads_filename, superstring_filename, p = 1.0): reads = read_strings_from_file(reads_filename) superstring assert sometimes_greedy_assemble (reads, p) = read_strings_from_file(superstring_filename) [0] # TEST: returns a string (8 POINTS) assembly = sometimes_greedy_assemble(tiny_reads) # TEST: returns a superstring (6 POINTS) def check_is_superstring (assembly, reads): for read in reads: assembly = assert isinstance (assembly, str), "Return value of sometimes_greedy_assemble is not a str" print("SUCCESS: returns a string passed!") == = assert read in assembly, f" read '{read}' is not contained in assembly" sometimes_greedy_assemble (tiny_reads) check_is_superstring(assembly, tiny_reads) print("SUCCESS: returns a superstring passed!") superstring # TEST: tiny_deterministic (4 POINTS) assembly sometimes_greedy_assemble(tiny_reads, 1.0) assert assembly == 'CATAGTAAT' print("SUCCESS: tiny_deterministic passed")
Expert Answer:
Answer rating: 100% (QA)
Based on the provided screenshots you seem to be asking for assistance with understanding or implementing a sometimes greedy assembly algorithm as des... View the full answer
Related Book For
Accounting for Decision Making and Control
ISBN: 978-0078025747
8th edition
Authors: Jerold Zimmerman
Posted Date:
Students also viewed these programming questions
-
Federalism is defined as shared government between a central government and regional units of government. Essentially, federalism is about the balance of power between the federal government and the...
-
Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...
-
Googles ease of use and superior search results have propelled the search engine to its num- ber one status, ousting the early dominance of competitors such as WebCrawler and Infos- eek. Even later...
-
On what date is CGT for 2020-21 normally due for payment?
-
Residential building codes typically require the use of 12-gauge copper wire (diameter 0.205 3 cm) for wiring receptacles. Such circuits carry currents as large as 20 A. A wire of smaller diameter...
-
What is the net present value (NPV), the internal rate of return (IRR), and the modified IRR (MIRR)? 2. Based on the evaluation, replacing the old equipment appears to be a (good or bad) decision...
-
You have received a job offer from a startup company headed by \(\mathrm{PhD}\) chemists to be their chief (and only) process engineer, responsible to move their process for a novel solid adhesive...
-
Data for Rajesh Company are presented in P17-7A. Further analysis reveals the following. 1. Accounts payable pertain to merchandise suppliers. 2. All operating expenses except for depreciation were...
-
The issue at hand is whether Burger Queen violated their duty of good faith towards Scheck by opening a new store close to his existing franchise, knowing it would harm his business. Scheck can...
-
Three professors examined awareness of four widely disseminated retirement rules among employees at the University of Utah. These rules provide simple answers to questions about retirement planning...
-
Khaled company facing its first annual net loss as the end of the year approached and he is under the pressure from the companys creditors to report positive net income for the year. He told the...
-
research all about the following: 1) LINUX SYSTEM 2) LINUX COMPONENTS 3) LINUX DISTRIBUTIONS
-
Find a photograph of yourself that also shows examples of goods, services, and factors of production. Then respond to the questions. Identify three to five goods or services in the photograph, and...
-
What is the difference between linux/x64/shell/bind_tcp and linux/x64/shell_bind_tcp ? What is the difference between linux/x64/shell/bind_tcp and linux/x64/shell/reverse_tcp ? What is the...
-
The criminal justice field inherently revolves around physicality, stress and ability to take action when necessary. These attributes make the need for a structured, focused approach to sustained...
-
Given that the power P developed by a wind turbine is a function of the diameter D and rotational speed N of the rotor, and of the density p, dynamic viscosity and velocity V of the airstream,...
-
From the Sources and Uses of Funds Statement in Exhibit 1 calculate the estimated changes in loans and deposits from month-to-month, as well as estimated liquidity needs. (b) Given the results of...
-
During 2012, Cheng Book Store paid $483,000 for land and built a store in Georgetown. Prior to construction, the city of Georgetown charged Cheng $1,300 for a building permit, which Cheng paid. Cheng...
-
Geico is considering expanding an existing plant on a piece of land it already owns. The land was purchased 15 years ago for $ 325,000 and its current market appraisal is $ 820,000. A capital...
-
Marian Health Care is a large hospital system outside Chicago that offers both hospital (inpatient) and clinic (outpatient) services. It has a centralized admissions office that admits and registers...
-
Potter- Bowen (PB) manufactures and sells postage meters throughout the world. Postage meters print the necessary postage on envelopes, eliminating the need to affix stamps. The meter keeps track of...
-
What is the pressure drop associated with water at \(27^{\circ} \mathrm{C}\) flowing with a mean velocity of \(0.1 \mathrm{~m} / \mathrm{s}\) through an \(800-\mathrm{m}-\) long cast iron pipe of...
-
Fully developed conditions are known to exist for water flowing through a \(50-\mathrm{mm}\)-diameter tube at \(0.02 \mathrm{~kg} / \mathrm{s}\) and \(27^{\circ} \mathrm{C}\). What is the maximum...
-
Water at \(35^{\circ} \mathrm{C}\) is pumped through a horizontal, \(200-\mathrm{m}\)-long, \(30-\mathrm{mm}\)-diameter tube at \(0.25 \mathrm{~kg} / \mathrm{s}\). Over time, a 2-mm-thick layer of...
Study smarter with the SolutionInn App