Question: Write a program to construct a dictionary of all words, defined to be runs of consecutive nonwhitespace, in a given text file. We might then
Write a program to construct a dictionary of all “words,” defined to be runs of consecutive nonwhitespace, in a given text file. We might then compress the file (ignoring the loss of whitespace information)
by representing each word as an index in the dictionary. Retrieve the file rfc791.txt from the RFC repository, and run your program on it.
Give the size of the compressed file, assuming first that each word is encoded with 12 bits (this should be sufficient) and then that the 128 most common words are encoded with 8 bits and the rest with 13 bits. Assume that the dictionary itself can be stored by using, for each word, length(word) + 1 bytes.
Step by Step Solution
3.52 Rating (145 Votes )
There are 3 Steps involved in it
There are several steps to answer your question First we need to build a program that constructs the ... View full answer
Get step-by-step solutions from verified subject matter experts
