Question: The following code counts how often different words occur in a text file. In this program, bstmap has an interface and behavior identical to a
The following code counts how often different words occur in a text file. In this program, bstmap
char toLC (char c) { return (c >= 'A' && c <= 'Z') ? c-'A'+'a' : c; } void countWordsInFile (istream& inFile, bstmap& counts) { string word; while (inFile >> word) { // strip away any non-alphabetic characters string::size_type n = word.find_first_not_of (wordCharacters); if (n != string::npos) word = word.substr (0, n); if (word.length() > 0) { // if there's anything left, count it word = transform(word.begin(), word.end(), word.begin(), toLC); /** ... increment the appropriate counter in counts ... **/ map::iterator pos = counts.find (word); if (pos == counts.end()) // if this is the 1st time we've seen // this word counts.insert (map::value_type(word, 1)); else // else if we've seen this word before ++((*pos).second); } } } Let W denote the number of words in the document file being processed. Many of these words, in a typical document, will be duplicates. Let D denote the number of distinct (non-duplicate) words, D <= W. A study of several large text files has suggested that, for sufficiently large W, authors will use an average vocabulary of D=c*log(W) distinct words, where c is some constant. Assume the length of any individual word is bounded by some constant (e.g., no single word is longer than "antidisestablishmentarianism"). What is the average-case complexity for countWordsInFile?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
