Huffman coding is a common compression technique that assigns fewer bits to frequent items, using a binary
Question:
Huffman coding is a common compression technique that assigns fewer bits to frequent items, using a binary tree. The input is a document consisting of a bunch of items (e.g., characters). Huffman coding outputs a set of codewords for all items.
Below, we use an example to illustrate the process.
Given input string aabcabdacb
1. Huffman coding first determines the frequencies of each item. Here, a occurs 4 times, b 3, c 2, and d 1. Total is 10.
2. When building the tree, each item is first set as a "leaf node":
3. The pair of nodes yielding the lowest sum is found, and merged into a new node formed with that sum. Here, c and d yield 2 + 1 = 3.
4. The merging continues. The lowest sum is b's 3 plus the new node's 3, yielding 6. (Note that c and d are no longer eligible nodes).
5. The merging ends when only 1 node exists.
6. Each leaf node's encoding is obtained by traversing from the top node to the left. Each left branch appends a 0, and each right branch appends a 1, to the code. Below are the codewords for all four items:
Item | Codeword |
a | 0 |
b | 10 |
c | 110 |
d | 111 |
7. Then the input string aabcabdacb can be decoded to 0010110010111011010.
Huffman coding is a common compression technique that assigns fewer bits to frequent items, using a binary tree. The input is a document consisting of a bunch of items (e.g., characters). Huffman coding outputs a set of codewords for all items.
Below, we use an example to illustrate the process.
Given input string aabcabdacb
1. Huffman coding first determines the frequencies of each item. Here, a occurs 4 times, b 3, c 2, and d 1. Total is 10.
2. When building the tree, each item is first set as a "leaf node":
3. The pair of nodes yielding the lowest sum is found, and merged into a new node formed with that sum. Here, c and d yield 2 + 1 = 3.
4. The merging continues. The lowest sum is b's 3 plus the new node's 3, yielding 6. (Note that c and d are no longer eligible nodes).
5. The merging ends when only 1 node exists.
6. Each leaf node's encoding is obtained by traversing from the top node to the left. Each left branch appends a 0, and each right branch appends a 1, to the code. Below are the codewords for all four items:
Item | Codeword |
a | 0 |
b | 10 |
c | 110 |
d | 111 |
7. Then the input string aabcabdacb can be decoded to 0010110010111011010.
Suppose we have a 100-character text has these character frequencies:
A: 50
C: 40
B: 4
D: 3
E: 3
Fill in the blanks the answer for each of the question
- What is the first merge?
- D and E: 6
- B and D: 7
- B and D and E: 10
- What is the second merge?
- B and D: 7
- DE and B: 10
- C and A: 90
- What is the third merge?
- DEB and C: 40
- DEB and C: 50
- What is the fourth merge?
- None
- DEBC and A: 100
- What is the fifth merge?
- None
- DEBCA and F
- What is the code for A?
- 0
- 1
- What is the code for B?
- 001
- 110
- What is the code for C?
- 1
- 01
- 10
- What is the code for D?
- 1110
- 1111
- What is the code for E?
- 1110
- 1111
- 5 unique characters (A, B, C, D, E) can each be uniquely encoded in 3 bits (like 000, 001, 010, 011, and 100). With such a fixed-length code, how many bits are needed for the 100-character text?
- 100
- 300
- For the Huffman code determined in the above questions, the number of bits per character is A: 1, C: 2, B: 3, D: 4, and E: 4. How many bits are needed for the 100-character text?
- 14
- 166
- 300
Building Java Programs A Back To Basics Approach
ISBN: 9780135471944
5th Edition
Authors: Stuart Reges, Marty Stepp