Question: A DNA strand can be represented as a (very long) string w over the alphabet (A, C,G, T For example, the human DNA has length

 A DNA strand can be represented as a (very long) string

A DNA strand can be represented as a (very long) string w over the alphabet (A, C,G, T For example, the human DNA has length 3 10. Because of the double-helix nature of DNA, we really should be talking about the base pairs A-T and G-C, in the sense that DNA is made of base-paired sequences: for example, instead of tu -ACTG(ACT, we could instead look at its reverse complement wAGTCCAGT, obtained by reversing w and applying the homomorphism A T, T A, C G, G C. To match DNA from a sample to a reference DNA w, or even to build de novo a reference DNA w, a sequencer can be used to generate a large number of relatively short substrings appearing in w (or in w,the sequencer has no way to tell) called reads. Sequencing technology is rapidly evolving, but we can assume that it is possiblele to generate 10 reads of length 100 each (in reality, the length of these reads may vary a little, sometimes we may have reads over A, C, G, T, N), where N indicates that the sequencer was not able to determine the exact value being read, and sometimes the sequencer may even misread a value; let's ignore these possibilities for simplicity). If the human reference DNA has length 3 . 109, what fraction of the possible reads is present in the human DNA? 10 points A DNA strand can be represented as a (very long) string w over the alphabet (A, C,G, T For example, the human DNA has length 3 10. Because of the double-helix nature of DNA, we really should be talking about the base pairs A-T and G-C, in the sense that DNA is made of base-paired sequences: for example, instead of tu -ACTG(ACT, we could instead look at its reverse complement wAGTCCAGT, obtained by reversing w and applying the homomorphism A T, T A, C G, G C. To match DNA from a sample to a reference DNA w, or even to build de novo a reference DNA w, a sequencer can be used to generate a large number of relatively short substrings appearing in w (or in w,the sequencer has no way to tell) called reads. Sequencing technology is rapidly evolving, but we can assume that it is possiblele to generate 10 reads of length 100 each (in reality, the length of these reads may vary a little, sometimes we may have reads over A, C, G, T, N), where N indicates that the sequencer was not able to determine the exact value being read, and sometimes the sequencer may even misread a value; let's ignore these possibilities for simplicity). If the human reference DNA has length 3 . 109, what fraction of the possible reads is present in the human DNA? 10 points

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!