Question: 1 . File compression with Huffman coding As discussed in class on Nov. 1 4 , tries and Huffman coding can be used to compress
File compression with Huffman coding
As discussed in class on Nov. triesandHuffman codingcan be used to compress filesmessages by analyzing character frequencies and choosing bitcodes accordingly.
A nearlycompleteHuffmanclass is defined for you inhuffmanhhcode here It uses aTrieNodeclass that stores parent and child links as well as the character being encoded and its frequency in the file being compressed.
The main work ofHuffmanhappens in the following functions:
compress
Reads input file character by character and counts occurrences aka computes frequency for each, storing the results by ASCII index in thecharfrequencyvectorApplies the Huffman triebuilding algorithm presented in class inbuildoptimaltrie This function is TO BE WRITTEN BY YOUCreates code lookup tables stringtochardecompressionmapandchartostringcompressionmap from the optimal trie incomputeallcodesfromtrie This function is TO BE WRITTEN BY YOUWritesdecompressionmapas a header to the output fileRereads input file character by character and, usingcompressionmap, writes encoded versions to the output file body in ASCII or binary depending ondobinaryflag
decompress
Reads the code lookup table from header of input file todecompressionmapReads rest of input file bitstream, gets character for each variablelength code read usingdecompressionmap, and writes decoded versions to output file. The ASCII part of this function,asciidecompressbodyis TO BE WRITTEN BY YOU
The "forest of tries" discussed in lecture is implemented in theHuffmanclass with anSTL priority queueofTrieNodepointers each representing the root of a subtrie ordered on the combined frequency of each subtrie. This priority queue is calledtriepq
The compiled executable is run as follows:huff
If the input filename has the suffixhuf, it will bedecompressedto a plaintext file with an identical name except the suffix is changed toHUF.
If the input filename has any other suffix, it will becompressedto a file with the suffixhuf.
The default compressor creates a binary file as output and the default decompressor expects a binary file as input. Feel free to compare the sizes of compressed files to the originals. However, for grading, compressed output will be written in ASCII and compressed input will also be expected to be ASCII. This is indicated by the optional flagascii.
The output files will be echoed to the terminal for grading for both compression to test yourbuildoptimaltriecomputeallcodesfromtriecombination and decompression to test yourasciidecompressbodyin isolation In order to testbuildoptimaltriealone another optional flagtriewill print only a representation of the built trie and not echo the output file.
Sample inputsoutputs
This tarfilecontains the following files:
seashells.txt total chars because there is a newline, different chars
seashells.txthuf
outputoutputoutputthese are what is written to the terminal, not what is saved in a file
outputresults from runninghuff ascii seashells.txt
outputresults from runninghuff ascii trie seashells.txt
outputresults from runninghuff ascii seashells.txthuf
Programming tasks
TheHuffmanmember functions to finish are inmaincpp don't make any changes inhuffmanhh:
pointHuffman::asciidecompressbody: This function should read one "bit" a or char at a time from the input stream until a legal code is assembled, then write the corresponding decoded character to the output stream, and repeat until reaching the end of file
pointsHuffman::buildoptimaltrie: This function should create a forest of onenode tries akatriepq representing the characters in the input file and their frequencies, then repeatedly combine the two smallest tries until only one is left according to the following instructions:
Remove trie intriepqwith SMALLEST frequency, assign tofirst remove trie in PQ with NEXT SMALLEST frequency, assign tosecond Create newTrieNodeand makefirstandsecondits left and right children, respectively. Set new node's frequency to the sum offirstandseconds frequencies, and then put the whole thing back into the priority queue
pointsHuffman::computeallcodesfromtrie: The function should recursively traverse the trie rooted atTto determine thehuffcodestring at every leaf and initializecompressionmapanddecompressionmapwith this information
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
