Question: Assume you work on a file with a collection of very biased DNA sequences--there are a lot more As and Ts than Cs and Gs

Assume you work on a file with a collection of very biased DNA sequences--there are a lot more As and Ts than Cs and Gs in the sequences (A: 50%, T: 35%, C: 10% and G: 5%). You will use the huffman coding to compress the file: a huffman code tree will be built for the given distribution of A, T, C and G, and the huffman codes will be used to encode the DNA sequences. What is the compression ratio you can achieve (compresion ratio = uncompressed-size/compressed-size)? Explain your calculation briefly. Hint: fixed-length codes of 2 bits are sufficient for encoding four letters; you need to use this information to calculate the uncompressed size.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!