Question: I need help Implementing assignment using c++. Assignement: Purpose To explore a meaningful use of complex binary trees To implement a real algorithm still in
I need help Implementing assignment using c++.
Assignement:
Purpose
To explore a meaningful use of complex binary trees To implement a real algorithm still in use today Background and Instructions
Lossless data compression remains a key interest in computer science. The goal is to represent data using fewer bits than the human-readable data normally requires, as compressed data is more easily stored and transmitted. Decompression can later restore the data to its original state.
It is almost a guarantee that you have used the ZIP archive file format at some point in your life; it is natively supported in Microsoft Windows, Mac OS X, and most free operating systems. It's quite possible you have used it earlier today. Although the file format accepts many compression algorithms, the most common is DEFLATE. This algorithm uses the LZ77 algorithm and Huffman coding to compress data. Huffman coding creates a prefix code for each character based on the frequency of the character, with the most used characters receiving the shortest codes.
For this assignment, you will implement Huffman coding. You can learn more about the process here, but be warned that the algorithm has some trivial ambiguities that are resolved below. Huffman Coding Algorithm
The algorithm begins by accepting an input string and counting the number of times each character is used. Each character-count pair is stored in a leaf node. Add these leaf nodes to a priority queue, giving lower counts higher priorities and removing the highest priority node first. If two leaf nodes have equal count, the alphabetically smaller character has lower priority. If a leaf node and a compound node (see below) have equal count, the leaf node has lower priority. If two compound nodes have equal count, the alphabetically smaller compound node has lower priority. Remove the first two nodes from the priority queue, combine them into a compound node, and add the new node back into the priority queue. Repeat this process until only one node is left. A compound node has the following properties: Its right child is the first node removed from the priority queue Its left child is the second node removed from the priority queue Its count is equal to the sum of the counts of its two children Its "character" is a concatenation of its left child's character and its right child's character (compound nodes store a string representing the characters its descendent leaves store) The single remaining compound node in the priority queue is the root of the Huffman tree and is ready for encoding and decoding messages. To encode a message, encode each character individually and concatenate the results together. For each character, follow the path from the root of the Huffman tree to the leaf node storing that character; record each "left turn" as a 0 and each "right turn" as a 1. The character's encoding is the resulting bitstream. If the message you are asked to encode contains a character not found in your tree, return an empty string. If asked to encode the empty string, return an empty string. To decode a message, begin at the root of the Huffman tree. For each 0 or 1 in the encoded message, move down the tree left or right, respectively. When a leaf node is reached, record the character it is storing and return to the root of the tree. Continue this process until you reach the end of the encoded message; the last 0 or 1 in the encoded message should take you to the leaf node storing the last character in the decoded message. If the message you are asked to decode ends at a compound node instead of a leaf node, return an empty string. If asked to decode the empty string or any message containing characters other than 0 and 1, return an empty string.
Requirement Notes
As stated above, there are a number of trivial ambiguities in Huffman coding, but in order to facilitate automated testing, you'll need to adhere to the algorithm outlined above. You must make your own priority queue using a maximum heap. This can be tree-based or array/vector-based. "Alphabetically smaller" for this assignment means using the < operator on strings. As an example, the input string "Secret" produces the following encodings: S = 000 c = 001 e = 1 r = 010 t = 011 IMPORTANT NOTES FROM PROFESSOR: - use array - Must build own priority queue. STL queue not allowed. - You can use a map class to burn a tree (whenever you fine a lead node delete.) - SPACE complexity is what we are worried about. The program will not be graded on run time efficiency. - Manage memory correctly. Do not have memory leaks! - Capital letters have highest priority of all - no main needed - only use for your testing purposes.
// steps for program step 1: Priority Queue, burn when tree is built. Step 2: tree builds code to map, burn tree when finished. Step 3: map
functions given by professor that can't be changed: Coder(string sample_text); string encode(string message); string decode(string encoded_message);
coder.h
#pragma once #include
class Coder { public: /** * Constructor; uses the provided sample text as an input string to create * a Huffman tree, which is used for encoding and decoding messages. You may * assume the given sample text will contain at least two distinct characters. * * See the lab specs for full details. */ Coder(string sample_text); /** * Encodes the given string based on the Huffman tree created from the sample text * (provided in the constructor of this class). * * See the lab specs for some details on this method's operations. */ string encode(string message); /** * Decodes the given string based on the Huffman tree created from the sample text * (provided in the constructor of this class). * * See the lab specs for some details on this method's operations. */ string decode(string encoded_message); };
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
