Question: Huffman Coding: Huffman coding is an algorithm devised by David A. Huffman of MIT in 1952 for compressing text data to make a file occupy
Huffman Coding:
Huffman coding is an algorithm devised by David A. Huffman of MIT in 1952 for compressing text data to make a file occupy a smaller number of bytes. This relatively simple compression algorithm is powerful enough that variations of it are still used today in computer networks, fax machines, modems, HDTV, and other areas. Normally text data is stored in a standard format of 8 bits per character, commonly using an encoding called ASCII that maps every character to a binary integer value from 0-255. The idea of Huffman coding is to abandon the rigid 8-bits-percharacter requirement and use different-length binary encodings for different characters. The advantage of doing this is that if a character occurs frequently in the file, such as the letter 'e', it could be given a shorter encoding (fewer bits), making the file smaller. The tradeoff is that some characters may need to use encodings that are longer than 8 bits, but this is reserved for characters that occur infrequently, so the extra cost is worth it.
Compressing Data using Huffman Coding: Part 1: Build a Frequency Table The first task is to build a frequency table. Please refer to the example in Table 1. At this point, you should fill the first two columns of this table: Character and Frequency. The Huffman code is not known yet. In this table:
Items include characters that appear only in the input string.
Characters are case sensitive
Characters are listed in the order of appearance in the input string
The minimum number of characters is 2
The frequency refers to the number of appearances of the character. You do not need to
normalize this number.
Part 2: Build a Huffman Coding Tree
Follow the algorithm and build a Huffman coding tree based on the information stored in the frequency table. Use a priority queue as explained in the slides.
Part 3: List the Huffman Codes
Based on the Huffman Coding tree built in the part 2, you should create the Huffman codes for each of the characters that appear in the input string. Please note that you DO NOT need to perform bit operations in this assignment. Please store the encoded information as a String object. For example, to represent the bit stream 11110000, you can create a String object with a new String 11110000.
String huffmanCode = 11110000;
The Huffman code should be stored in the corresponding column of the Frequency Table.
Part 4: Encoding a String
Based on the Huffman code generated by part 3, your software should convert the input string into encoded bits. For each of the characters in the input string, look up the bit stream in the Frequency Table and replace the character with the encoded bit stream.
Part 5: Decoding bits
Decode the bit stream using the Huffman Coding Tree generated in the part 2.
Part 6: Testing Your Software
Test your software by creating a Tester program. It takes as argument the input string to be encoded. If you want to include a space in your input string please use quotation marks around your string. (This might work only on the command line not in the Eclipse IDE). This program tests:
Frequency Table
Encoding
Decoding
Note: For the set of characters that occur the same number of times, there may be more than one possible set of codes. Similarly, a non-leaf node and a leaf node can have same number of frequencies and it can cause more than one possible set of codes.
Sample Output:
% java Tester eeyjjjj ======================================
char frequency code --------------------------------------
e 2 10 y 1 11 j40
======================================
Encoded bit stream: 1010110000 Total number of bits without Huffman coding: 112 Total number of bits with Huffman coding: 10
Decoded String: eeyjjjj
% java Tester "My Test works totally fine" ====================================== char frequency code --------------------------------------
M 1 01111 y 2 1000
4 000 T 1 01110
e 2 0011 s 2 0010 t 3 101
w 1 01101 o 2 0101 r 1 01100 k 1 1111 a 1 1110 l 2 0100 f 1 1101 i 1 1100 n 1 1001
======================================
Encoded bit stream: 0111110000000111000110010101000011010101011001111001000010101011011 1100100010010000001101110010010011 Total number of bits without Huffman coding: 416 Compression Ratio: ? Total number of bits with Huffman coding: 101 Compression Ratio: ? Decoded String: My Test works totally fine
Submission:
Submit your files on Canvas using the Programming Assignment 2 submission Link. You will submit a zip file containing:
1. Decoder.java 2. Encoder.java 3. HuffmanFrequencyTable.java 4. HuffmanTree.java 5. HuffmanTreeNode.java 6. TableItem.java 7. Tester.java
Grading Rubric:
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
