Question: 4. Problem 4: Compression (20 pts) You are a lead engineer at a large bio-tech company. Your project is to design a compression scheme to

4. Problem 4: Compression (20 pts) You are a lead engineer at a large bio-tech company. Your project is to design a compression scheme to store a huge DNA database of a unknown species consisting of five different DNA bases: V, W, X, Y, and Z. Suppose the DNA samples can be linearized as a sequence of bases, e.g., XXWYZ.... Your team has empirically computed the pmf p = (pv, pw,px, py, pz) = (0.3,0.2, 0.125, 0.25, 0.125) for V, W, X, Y, and Z, respectively. Assuming that the sequence of DNA bases are i.i.d. (a) Determine a binary Huffman code for V, W, X, Y, and Z. What is its expected length per DNA base? (10 pts) (b) As in (b), what is the expected length per DNA base when using Shannon coding? (5 pts) (C) Suppose you made a mistake of swapping pv and px with each other, i.e., you use q = (Px, PW, PV , PY,pz) for Huffman coding. Without running the Huffman compression program, can you estimate the additional bits per symbol as a result of using q instead of p? (5 pts) 4. Problem 4: Compression (20 pts) You are a lead engineer at a large bio-tech company. Your project is to design a compression scheme to store a huge DNA database of a unknown species consisting of five different DNA bases: V, W, X, Y, and Z. Suppose the DNA samples can be linearized as a sequence of bases, e.g., XXWYZ.... Your team has empirically computed the pmf p = (pv, pw,px, py, pz) = (0.3,0.2, 0.125, 0.25, 0.125) for V, W, X, Y, and Z, respectively. Assuming that the sequence of DNA bases are i.i.d. (a) Determine a binary Huffman code for V, W, X, Y, and Z. What is its expected length per DNA base? (10 pts) (b) As in (b), what is the expected length per DNA base when using Shannon coding? (5 pts) (C) Suppose you made a mistake of swapping pv and px with each other, i.e., you use q = (Px, PW, PV , PY,pz) for Huffman coding. Without running the Huffman compression program, can you estimate the additional bits per symbol as a result of using q instead of p? (5 pts)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
