Question: I have this assignment in my data structures class that I'm really not sure how to start. I'll upload the prompts and starter code. If

I have this assignment in my data structures class that I'm really not sure how to start. I'll upload the prompts and starter code. If anyone could help me I would really appreciate it!

Implement a function called cnt_freq(filename) that opens a text file with a given file name (passed as a string) and counts the frequency of occurrences of all the characters within that file. Use the built-in Python List data structure of size 256 for counting the occurrences of characters. This will provide efficient access to a given position in the list. (In non-Python terminology you want an array.) You can assume that in the input text file there are only 8-bit characters resulting in a total of 256 possible character values. This function should return the 256 item list with the counts of occurrences. There can be issues with extra characters in some systems. Suppose the file to be encoded contained:

ddddddddddddddddccccccccbbbbaaff

Numbers in positions of freq counts [97:104] = [2, 4, 8, 16, 0, 2, 0]

Here is the Python List data structure:

I have this assignment in my data structures class that I'm really

Define a function comes_before (a,b) for Huffman trees that returns true if tree a comes before tree b. A Huffman tree a comes before Huffman tree b if the occurrence count of a is smaller than that of b. In case of equal occurrence counts, break the tie by using the ASCII value of the character to determine the order. If, for example, the characters d and k appear exactly the same number of times in your file, then d comes before k, since the ASCII value of character d is less than that of k.

Write a function that builds a Huffman tree from a given list of the number of occurrences of characters returned by cnt_fref() and returns the root node of that tree. Call this function create_huff_tree(list_of_freqs).

Start by creating a sorted list (can be a Python list, and you can use built-in functions) of individual Huffman trees each consisting of single HuffmanNode node containing the character and its occurrence counts. Building the actual tree involves removing the two nodes with the lowest frequency count from the sorted list and connecting them to the left and right field of a new created Huffman Node as in the example provided. The node that comes before the other node should go in the left field.

Implement a function named create_code(root_node) that traverses the Huffman tree that was passed as an argument and returns an array (using a Phython list) of 256 strings. Use the characters respective integer ASCII representation as the index into the arrary, with the resulting Huffman code for that character stored at that location. Traverse the tree from the root to each leaf node and adding a 0 when we go left and a 1 when we go right constructing a string of 0s and 1s.

You may want to initialize a Python list of strings that is initially consists of 256 empty strings in create_code. When create_code completes, this list will store for each character (using the characters respective integer ASCII representation as the index into the list) the resulting Huffman code for the character. The code will be represented by a sequence of 0s and 1s in a string. Note that many entries in this list may still be the empty string. Return this list.

Write a function called huffman_encode(in_file, out_file) (use that exact name) that reads an input text file and writes to an output file the following:

A header (see below for format) on the first line in the file (should end with a newline)

Using the Huffman code, the encoded text into an output file.

Write a function called create_header(list_of_freqs) that takes as parameter the list of freqs previously determined from cnt_freq(filename). The create_header function returns a string of the ASCII values and their associated frequencies from the input file text, separated by one space. For example, create_header(list_of_freqs) would return 97 3 98 4 99 2 for the text aaabbbbcc

The huffman_encode function accepts two file names in that order: input file name and output file name, represented as strings. If the specified output file already exists, its old contents will be erased. See example files in the test cases provided to see the format.

When testing, always consider edge conditions like the following cases: If the input file consists only of some number of a single character, say "aaaaa", it should write just that character followed by a space followed by the number of occurrences: 97 5 In case of an empty input text file, your program should also produce an empty file. If an input file does not exist, your program should raise a FileNotFoundError.

____________________________________________________________________________________

Starter Code:

class HuffmanNode: def __init__(self, char, freq): self.char = char # stored as an integer - the ASCII character code value self.freq = freq # the freqency associated with the node self.left = None # Huffman tree (node) to the left self.right = None # Huffman tree (node) to the right

def set_left(self, node): self.left = node

def set_right(self, node): self.right = node

def comes_before(a, b): """Returns True if tree rooted at node a comes before tree rooted at node b, False otherwise"""

def combine(a, b): """Creates and returns a new Huffman node with children a and b, with the "lesser node" on the left The new node's frequency value will be the sum of the a and b frequencies The new node's char value will be the lesser of the a and b char ASCII values"""

def cnt_freq(filename): """Opens a text file with a given file name (passed as a string) and counts the frequency of occurrences of all the characters within that file"""

def create_huff_tree(char_freq): """Create a Huffman tree for characters with non-zero frequency Returns the root node of the Huffman tree"""

def create_code(node): """Returns an array (Python list) of Huffman codes. For each character, use the integer ASCII representation as the index into the arrary, with the resulting Huffman code for that character stored at that location"""

def create_header(freqs): """Input is the list of frequencies. Creates and returns a header for the output file Example: For the frequency list asscoaied with "aaabbbbcc, would return 97 3 98 4 99 2 """

def huffman_encode(in_file, out_file): """Takes inout file name and output file name as parameters

Hex Char Dec Bin Hex Char Dec Bin Hex Char Dec Bin 0 0000 0000 00 [NUL] 32 0010 0000 20 space 64 0100 0000 40 e 10000 0001 01 [SOH] 33 0010 0001 21! 2 0000 0010 02 [STX] 34 0010 0010 22" 3 0000 001103 [ETX]135 0010 0011 23 # 4 0000 0100 04 [EOT] 36 0010 0100 24 $ 5 0000010105 [ENQ]137 0010 0101 25 % 60000 0110 06 [ACK]38 0010 0110 26& 7 0000 0111 07 [BEL] 39 0010 0111 27 8 0000 1000 08 [BS] 40 0010 1000 28( 9 0000100109 [TAB]| 41 0010100129 ) 10 0000 1010 0A [LF] 42 0010 1010 2A * 11 0000 10110 [ ] |43 00101011 2 + 12 0000 1100 0C [FF] 44 0010 1100 2C 13 0000 1101 OD [CR] 45 0010 1101 2D - 14 0000 1110 OE [so] 46 0010 1110 2E 15 0000 1111 OF [SI 47 0010 1111 2F / 16 0001 0000 10 [DLE] 48 0011 0000 30 0 17 0001 0001 11 [DC1] 49 0011 0001 31 1 18 0001 0010 12 [DC2] 50 0011 0010 32 2 19 0001 0011 13 [DC3] 51 0011 0011 33 3 20 0001 0100 14 [DC4] 52 0011 0100 34 4 21 0001 0101 15[NAK] 53 0011 0101 35 5 22 0001 0110 16 [SYN] 54 0011 0110 36 6 23 0001 0111 17 [ETB]55 0011 0111 37 7 24 0001 1000 18 [CAN] 56 0011 1000 38 8 25 0001 1001 19 [) 157 0011 100139 9 26 0001 1010 1A [SUB]158 0011 1010 3A : 27 0001 1011 1B [ESC] 59 00111011 3B 28 0001 1100 1C [FS] 60 0011 1100 3C

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!