Question: Basic Part ( 1 0 0 points ) The Huffman s Algorithm In this homework assignment, I would like you to implement Huffman s algorithm.

Basic Part (100 points)
The Huffmans Algorithm
In this homework assignment, I would like you to implement Huffmans algorithm.
The Huffman Algorithm: Given an input text file in C++, do the following:
1. Perform a linear scan to gather frequencies of all the letters that occurred in the file. You should not consider letters with zero frequencies. Save the frequencies in list L of binary tree nodes. Here, each node shall contain a letter and its frequency.
2. Sort the list L according to frequencies in increasing order.
3. Remove the first two nodes N1 and N2 with the lowest frequencies, build a new node N with a hypothetical letter (a dummy) and a frequency as the sum of these of N1 and N2, and add N1 as the left child of N and N2 as the right child of N. Then, insert N into L to keep L in sorted order. Keep doing the above process until L has only one node T.
4. The node T obtained from Step 3 is the Huffman code tree. For any node in the tree, its edge pointing to its left child, if there is one, can be interpreted as 0. Similarly, its right edge pointing to its right child, if there is one, can be interpreted as 1. The binary string along the edge path from the root to a letter at a leaf node is thus the Huffman code for the letter.
5. Use Huffman codes from Step 4 to encode the input text file and output the coded file in an output file, which is the encoded file.
6. Take the encoded file obtained in Step 5, decode it using the Huffman codes from Step 4 and save the result in another output file, which is the decoded file.
An important note for Basic Part: For Huffman coding, say, a letter is coded with 1100101,
You may consider this coded string is a byte-string, i.e., a string of characters. You can output this byte-string as the compressed file, and then use the byte-string to un-compress the compressed file to uncover the original file. You do not have to encode the length of the byte-string to the compressed file. In fact, when you use such a byte-string, your compressed file may not be really compressed.
In computer science and information theory, a Huffman code is an optimal prefix code found using the algorithm developed by David A. Huffman while he was a Ph.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes". The process of finding and/or using such a code is called Huffman coding and is a common technique in entropy encoding, including in lossless data compression. The algorithm's output can be viewed as a variable-length code table for encoding a source symbol (such as a character in a file). Huffman's algorithm derives this table based on the estimated probability or frequency of occurrence (weight) for each possible value of the source symbol. As in other entropy encoding methods, more common symbols are generally represented using fewer bits than less common symbols. Huffman's method can be efficiently implemented, finding a code in linear time to the number of input weights if these weights are sorted. However, although optimal among methods encoding symbols separately, Huffman coding is not always optimal among all compression methods.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!