Question: Run the code that we discussed above and change the probability calculation ( in step 6 ) to remove the smoothing as follows: Remove the

Run the code that we discussed above and change the probability calculation (in step 6) to remove the smoothing as follows:
Remove the +1 to n-gram count to make the equation calculate the count of n-grams as it is without smoothing.
Remove the total count of unique n-gram vocabulary V.
The required change is bolded (step 6) in the following code:
#Step 2:
ngrams_all ={1:[],2:[],3:[],4:[]}
for i in range(4):
for each in tokenized_text:
for j in ngrams(each, i+1):
ngrams_all[i+1].append(j);
#Step 3:
ngrams_voc ={1:set([]),2:set([]),3:set([]),4:set([])}
for i in range(4):
for gram in ngrams_all[i+1]:
if gram not in ngrams_voc[i+1]:
ngrams_voc[i+1].add(gram)
#Step 4:
total_ngrams ={1:-1,2:-1,3:-1,4:-1}
total_voc ={1:-1,2:-1,3:-1,4:-1}
for i in range(4):
total_ngrams[i+1]= len(ngrams_all[i+1])
total_voc[i+1]= len(ngrams_voc[i+1])
#Step 5:
ngrams_prob ={1:[],2:[],3:[],4:[]}
for i in range(4):
for ngram in ngrams_voc[i+1]:
tlist =[ngram]
tlist.append(ngrams_all[i+1].count(ngram))
ngrams_prob[i+1].append(tlist)
#Step 6:
for i in range(4):
for ngram in ngrams_prob[i+1]:
ngram[-1]=(ngram[-1])/(total_ngrams[i+1])
By comparing the probability results of the two n-gram models (with and without smoothing), we will notice that the probability of some the n-grams decreased after the smoothing (marked in red), as they have higher probability than others. On the other hand, the probability of less frequent n-grams are increased after smoothing
Exercises
Now, its Your Turn! Build n-gram model with Add-1 smoothing for the following tokens:
tokenized_text =[['This','is', 'ngram','model'],
['This','is', 'smoothed', 'model'],
['This','is', 'unsmoothed', 'model'],
['This','is', 'ngram', 'lab']]
Task 1: build smoothed n-gram model based on the above tokens.
Task 2: what is the next word that is more likely to appear after This is?
Task 3: compare the probability of this word (from task 2) after and before smoothing. Is the probability increased or decreased after smoothing?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!