Question: ( a ) DNA sequences have four possible bases: C G A T . Below is an example of a part of a DNA sequence
a DNA sequences have four possible bases: C G A T Below is an
example of a part of a DNA sequence is shown.
GATCCTCCAT ATACAACGGT ATCTCCACCT
CAGGTTTAGA TCTCAACAAC GGAACCATTG
i Consider an ASCII encoding of the example below ignoring
spaces between bases what is the number of bits used to store
the example in memory?
ii Give a better suited encoding considering that DNA sequences
only contain four possible bases.
iii What is the compression ratio obtained by using the encoding you
chose in ii instead of the ASCII encoding?
bi Consider the following shorter example:
ATATCGCATC
Perform LZW compression on this short example using the
following initial dictionary:
Show the dictionary constructed during compression and the
compressed data.
ii Using the same dictionary, expand the following compressed
data:
Show the dictionary after uncompressing each code and the
uncompressed sequence.
iii What is the compression ratio obtained on these small examples
from the ASCII encoding and considering that bits integers are
used to encode the compressed string? Why is this result different
from the average ratio of obtained on typical human genome
which is a DNA sequence of about megabytes of
uncompressed data solve all the questions
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
