Question: Assume you work on a file with a collection of very biased DNA sequences--there are a lot more As and Ts than Cs and Gs
Assume you work on a file with a collection of very biased DNA sequences--there are a lot more As and Ts than Cs and Gs in the sequences (A: 50%, T: 35%, C: 10% and G: 5%). You will use the huffman coding to compress the file: a huffman code tree will be built for the given distribution of A, T, C and G, and the huffman codes will be used to encode the DNA sequences. What is the compression ratio you can achieve (compresion ratio = uncompressed-size/compressed-size)? Explain your calculation briefly. Hint: fixed-length codes of 2 bits are sufficient for encoding four letters; you need to use this information to calculate the uncompressed size.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
