Question: 4. Problem 4: Compression (20 pts) You are a lead engineer at a large bio-tech company. Your project is to design a compression scheme to

 4. Problem 4: Compression (20 pts) You are a lead engineer

4. Problem 4: Compression (20 pts) You are a lead engineer at a large bio-tech company. Your project is to design a compression scheme to store a huge DNA database of a unknown species consisting of five different DNA bases: V, W, X, Y, and Z. Suppose the DNA samples can be linearized as a sequence of bases, e.g., XXWYZ.... Your team has empirically computed the pmf p = (pv, pw,px, py, pz) = (0.3,0.2, 0.125, 0.25, 0.125) for V, W, X, Y, and Z, respectively. Assuming that the sequence of DNA bases are i.i.d. (a) Determine a binary Huffman code for V, W, X, Y, and Z. What is its expected length per DNA base? (10 pts) (b) As in (b), what is the expected length per DNA base when using Shannon coding? (5 pts) (C) Suppose you made a mistake of swapping pv and px with each other, i.e., you use q = (Px, PW, PV , PY,pz) for Huffman coding. Without running the Huffman compression program, can you estimate the additional bits per symbol as a result of using q instead of p? (5 pts) 4. Problem 4: Compression (20 pts) You are a lead engineer at a large bio-tech company. Your project is to design a compression scheme to store a huge DNA database of a unknown species consisting of five different DNA bases: V, W, X, Y, and Z. Suppose the DNA samples can be linearized as a sequence of bases, e.g., XXWYZ.... Your team has empirically computed the pmf p = (pv, pw,px, py, pz) = (0.3,0.2, 0.125, 0.25, 0.125) for V, W, X, Y, and Z, respectively. Assuming that the sequence of DNA bases are i.i.d. (a) Determine a binary Huffman code for V, W, X, Y, and Z. What is its expected length per DNA base? (10 pts) (b) As in (b), what is the expected length per DNA base when using Shannon coding? (5 pts) (C) Suppose you made a mistake of swapping pv and px with each other, i.e., you use q = (Px, PW, PV , PY,pz) for Huffman coding. Without running the Huffman compression program, can you estimate the additional bits per symbol as a result of using q instead of p? (5 pts)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!