Question: Python Code: In this section, you will verify a key statistical property of text: Zipf's Law. Zipf's Law describes the relations between the frequency rank

Python Code:

In this section, you will verify a key statistical property of text: Zipf's Law.

Zipf's Law describes the relations between the frequency rank of words and frequency value of words. For a word w, its frequency is inversely proportional to its rank:

countw= K 1/rankw

K is a constant, specific to the corpus and how words are being defined.

What would this look like if you took the log of both sides of the equation?

  • Write your answer in one or two lines here.

Therefore, if Zipf's Law holds, after sorting the words descending on frequency, word frequency decreases in an approximately linear fashion under a log-log scale.

Now, please make such a log-log plot by plotting the rank versus frequency

Hint: Make use of the sorted dictionary you just created. Use a scatter plot where the x-axis is the log(rank), and y-axis is log(frequency). You should get this information from word_counts; for example, you can take the individual word counts and sort them. dict methods .items() and/or values() may be useful. (Note that it doesn't really matter whether ranks start at 1 or 0 in terms of how the plot comes out.) You can check your results by comparing your plots to ones on Wikipedia; they should look qualitatively similar.

Please remember to label the meaning of the x-axis and y-axis.

import math

import operator

x = []

y = []

X_LABEL = "log(rank)"

Y_LABEL = "log(frequency)"

# implement me! you should fill the x and y arrays. Add your code here

# running this cell should produce your plot below

plt.scatter(x, y)

plt.xlabel(X_LABEL)

plt.ylabel(Y_LABEL)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!