Question: Lab #6 HUFFMAN CODING Huffman coding is an algorithm devised by David A. Huffman of MIT in 1952 for compressing text data to make a

Lab #6

HUFFMAN CODING Huffman coding is an algorithm devised by David A. Huffman of MIT in 1952 for compressing text data to make a file smaller (fewer bytes). This relatively simple algorithm is powerful enough that variations of it are still used today in computer networks, fax machines, modems, HDTV, and other areas. Normally text data is stored in a standard format of 8 bits per character using an encoding called ASCII that maps every character to a binary integer value from 0-255. The idea of Huffman coding is to abandon the rigid 8-bits-per-character requirement and use variable-length binary encodings for different characters. The advantage of doing this is that very frequently occurring 2 characters (say e in English text) could be given very short encodings (fewer bits). Although this means some characters may need more than 8 bits, the tradeoff is often worth it because those characters occur very infrequently. Refer to the Lecture 11/8/23 for an introduction to Huffman Coding.

The steps involved in Huffman encoding are: 1. Count character frequencies: Make a quick pass through the file to be encoded, counting how many of each distinct character you find in the text. 2. Build encoding tree: Build a binary tree using a specific queue-based algorithm. 3. Build encoding map: Traverse the binary tree to record the binary encodings of each character in a map for future quick-access use. 4. Encode the file: Do a second pass through the file, looking up each byte (each character) in the map from Step 3, to get the encoding for each byte.

STEP 1: COUNT CHARACTER FREQUENCIES Suppose we have a file named example.txt, whose contents are: "ab ab cab": In the original file, this text occupies 10 bytes (80 bits) of data. The 10th is a special "end-of-file" (EOF) byte.

Byte	1	2	3	4	5	6	7	8	9	10
Char	'a'	'b'	' '	'a'	'b'	' '	'c'	'a'	'b'	EOF
ASCII	97	98	32	97	98	32	99	97	98	256
Binary	01100001	01100010	00100000	01100001	01100010	00100000	01100011	01100001	01100010	N/A

In Step 1 of Huffman's algorithm, a count of each character is computed. The counts are represented as a map: {' ':2, 'a':3, 'b':3, 'c':1, EOF:1}

STEP 2: BUILD ENCODING TREE Step 2 of Huffman's algorithm places our counts into binary tree nodes, with each node storing a character and a count of its occurrences. The nodes are then put into a sorted list, which keeps them in prioritized order with smaller counts having higher priority, so that characters with lower counts will come out of the list sooner. (The priority list is somewhat arbitrary in how it breaks ties, such as 'c' being before EOF and 'a' being before 'b'). head +-----+ +-----+ +-----+ +-----+ +-----+ | 'c' | | EOF | | ' ' | | 'a' | | 'b' | | 1 | | 1 | | 2 | | 3 | | 3 | +-----+ +-----+ +-----+ +-----+ +-----+ Now the algorithm repeatedly removes the two nodes from the front of the queue (the two with the smallest frequencies) and joins them into a new node whose frequency is their sum. The two 3 nodes are placed as children of the new node; the first removed becomes the left child, and the second the right. The new node is re-inserted into the list in sorted order. This process is repeated until the list contains only one binary tree node with all the others as its children. This will be the root of our finished Huffman tree. See the picture below shows the final resulting tree. Notice that the nodes with low frequencies end up far down in the tree, and nodes with high frequencies end up near the root of the tree. This structure can be used to create an efficient encoding in the next step.

STEP 3: BUILD ENCODING MAP Step 3 of Huffman's algorithm takes the tree from Step 2 (see diagram above) and performs a traversal of it to harvest all the binary sequences that will be the encodings for each character. The binary sequence for each character is derived from your binary tree by thinking of each left branch as a bit value of 0 and each right branch as a bit value of 1, as shown in the diagram at above. The code for each character can be determined by traversing the tree. To reach ' ' [space], we go left twice from the root, so the code for ' ' is 00. The code for 'c' is 010, the code for EOF is 011, the code for 'a' is 10 and the code for 'b' is 11. By traversing the tree, we can produce a map from characters to their binary representations. Though the binary representations are integers, since they consist of binary digits and can be arbitrary length, we will store them as strings. For this tree, it would be: {' ':"00", 'a':"10", 'b':"11", 'c':"010", EOF:"011"}

STEP 4: ENCODING THE TEXT DATA Using the encoding map from Step 3, we can encode the files contents into a shorter binary representation. Using the preceding encoding map, the text "ab ab cab" would be encoded as 1011001011000101011011. The following table gives the char-to-binary mapping in more detail. The overall encoded contents of the file require 22 bits, or almost 3 bytes, compared to the original file of 10 bytes.

Char	'a'	'b'	' '	'a'	'b'	' '	'c'	'a'	'b'	EOF
Binary	10	11	00	10	11	00	010	10	11	011

Since the character encodings have different lengths, often the length of a Huffman-encoded file does not come out to an exact multiple of 8 bits. Files are stored as sequences of whole bytes, so 4 in cases like this the remaining digits of the last bit are filled with 0s.

It might worry you that the characters are stored without any delimiters between them, since their encodings can be different lengths and characters can cross byte boundaries, as with 'a' at the end of the second byte. But this will not cause problems in decoding the file, because Huffman encodings by definition have a useful prefix property where no character's encoding can ever occur as the start of another's encoding.

DECODING THE FILE A compressed file would not be very useful unless we have the ability to decompress it again! You can use a Huffman tree to decode text that was previously encoded with its binary patterns. The decoding algorithm is to read each bit from the file, one at a time, and use this bit to traverse the Huffman tree. If the bit is a 0, you move left in the tree. If the bit is 1, you move right. You do this until you hit a leaf node. Leaf nodes represent characters, so once you reach a leaf, you output that character. For example, suppose we are given the same encoding tree above, and we are asked to decode a file containing the following bits: 1110010001001010011.

Using the Huffman tree, we walk from the root until we find characters, then output them and go back to the root.

- We read a 1 (right), then a 1 (right). We reach 'b' and output b. Back to the root. 1110010001001010011 - We read a 1 (right), then a 0 (left). We reach 'a' and output a. Back to root. 1110010001001010011 - We read a 0 (left), then a 1 (right), then a 0 (left). We reach 'c' and output c. 1110010001001010011 - We read a 0 (left), then a 0 (left). We reach ' ' and output a space. 1110010001001010011 - We read a 1 (right), then a 0 (left). We reach 'a' and output a. 1110010001001010011 - We read a 0 (left), then a 1 (right), then a 0 (left). We reach 'c' and output c. 1110010001001010011 - We read a 1 (right), then a 0 (left). We reach 'a' and output a. 1110010001001010011 - We read a 0, 1, 1. This is our EOF encoding pattern, so we stop.

The overall decoded text is "bac aca".

YOUR ASSIGNMENT

Your assignment is to write an ARM assembler code to:

1. Read in a given Text file containing only letters, periods, and spaces. Perform a histogram analysis of the characters to determine the frequency of occurrences and generate the Huffman code as described. When

2. Given an encoded transmitted message, in which the bits are compacted into several 4-byte unsigned integers, decode and print out the original text message.

3. Please provide a Makefile as part of your build.

Example:

For the example above, the secret message would fit into a single 4-byte memory location as:

11100100010010100110000000000000

Note the 13 trailing 0s at the end of the message. You can encode the end of each 4-byte memory location with a bunch of spaces if you want, or you can define an extra character in addition to the letters, period, and spaces to encode an end-of-memory character to mark the end of a 4-byte memory block. The design is up to you. There's no problem with printing out a few spaces between letters.

Your decode.s function should take in as input a linked list of 4-byte unsigned integers containing the secret message:

- decode(r0) - where r0 contains the address of the first node of the linked list

where

r0 -->node1-->node2-->node3-->NULL

node1: 10101101010101010101010001110010 --> 10101101010101010101010001110010 --> 10101101010101010101010001110010 --> NULL

so if a binary code does not fit in at the end of a node, it will continue on into the next node so that there is no wasted bit.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Huffman Coding: Huffman coding is an algorithm devised by David A. Huffman of MIT in 1952 for compressing text data to make a file occupy a smaller number of bytes. This relatively simple compression...

Implement part two and three and do not use Java priority queue. Huffman Coding: Huffman coding is an algorithm devised by David A. Huffman of MIT in 1952 for compressing text data to make a file...

Design and implement the Huffman Tree using Queue and PriorityQueue from the previous assignments. The Huffman Tree will be further used to encode input strings. Your task is to develop a simple,...

Write a program that performs Huffman encoding and compression using a binary tree. Huffman coding is an algorithm devised by David A. Huffman of MIT in 1952 for compressing text data to make a file...

A common lossless technique of compressing data is to use an algorithm designed by David A. Huffman back in 1952. To compress data for this project, we will scan an ASCII file, create a frequency...

Problem Write a program that implements the "Huffman coding" compression algorithm using priority queues and binary trees. Huffman coding is an algorithm devised by David A. Huffman of MIT in 1952...

Write a program that implements the Huffman coding compression algorithm using priority queues and binary trees. Huffman coding is an algorithm devised by David A. Huffman of MIT in 1952 for...

Basic Part ( 1 0 0 points ) The Huffman s Algorithm In this homework assignment, I would like you to implement Huffman s algorithm. The Huffman Algorithm: Given an input text file in C + + , do the...

I need help Implementing assignment using c++. Assignement: Purpose To explore a meaningful use of complex binary trees To implement a real algorithm still in use today Background and Instructions...

Prepare a brief analysis of the organic food industry using the information provided by the Organic Trade Association at www.ota.com and the Organic Report magazine at theorganicreport.com. That is,...

The following financial data for a firm is provided to you. Calculate the Operating Cash Flow and Free Cash Flow for the year ended December 31, 2015. 2014 2015 Depreciation 3225.89 EBIT 333524.76...

The cumulative percentage polygon below shows the amount spent (in dollars) by 200 customers at a local supermarket. a. Approximately what percentage of customers spent less than $100? b....

Any negligence of the part of the plaintiff is a complete bar to recovery in: Group of answer choices a comparative negligence state a contributory negligence state a concurrent negligence state a...

What purposes do policy manuals serve in organizations? (Objective 5)

Teamwork. George Urban and Helen Felix graduated from your school and later started their own business. Because they believed their education contributed to their success, they recently pledged...

Technology. You volunteer at a Boys and Girls Club in your community. The playground equipment at the club is old and potentially unsafe.Youve decided to explore the possibility of applying for...