Zipf s law of word distribution states the following: Take a large corpus of text, count the

Question:

Zipf ’s law of word distribution states the following: Take a large corpus of text, count the frequency of every word in the corpus, and then rank these frequencies in decreasing order. Let fI be the Ith largest frequency in this list; that is, f1 is the frequency of the most common word (usually “the”), f2 is the frequency of the second most common word, and so on. Zipf’s law states that fI is approximately equal to α/I for some constant α. The law tends to be highly accurate except for very small and very large values of I.

Choose a corpus of at least 20,000 words of online text, and verify Zipf’s law experimentally. Define an error measure and find the value of α where Zipf’s law best matches your experimental data. Create a log–log graph plotting fI vs. I and α/I vs. I. (On a log–log graph, the function α/I is a straight line.) In carrying out the experiment, be sure to eliminate any formatting tokens (e.g., HTML tags) and normalize upper and lower case.

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question
Question Posted: