Question: Run the code that we discussed above and change the probability calculation ( in step 6 ) to remove the smoothing as follows: Remove the
Run the code that we discussed above and change the probability calculation in step to remove the smoothing as follows:
Remove the to ngram count to make the equation calculate the count of ngrams as it is without smoothing.
Remove the total count of unique ngram vocabulary V
The required change is bolded step in the following code:
#Step :
ngramsall ::::
for i in range:
for each in tokenizedtext:
for j in ngramseach i:
ngramsalliappendj;
#Step :
ngramsvoc :set:set:set:set
for i in range:
for gram in ngramsalli:
if gram not in ngramsvoci:
ngramsvociaddgram
#Step :
totalngrams ::::
totalvoc ::::
for i in range:
totalngramsi lenngramsalli
totalvoci lenngramsvoci
#Step :
ngramsprob ::::
for i in range:
for ngram in ngramsvoci:
tlist ngram
tlist.appendngramsallicountngram
ngramsprobiappendtlist
#Step :
for i in range:
for ngram in ngramsprobi:
ngramngramtotalngramsi
By comparing the probability results of the two ngram models with and without smoothing we will notice that the probability of some the ngrams decreased after the smoothing marked in red as they have higher probability than others. On the other hand, the probability of less frequent ngrams are increased after smoothing
Exercises
Now, its Your Turn! Build ngram model with Add smoothing for the following tokens:
tokenizedtext Thisis 'ngram','model'
Thisis 'smoothed', 'model'
Thisis 'unsmoothed', 'model'
Thisis 'ngram', 'lab'
Task : build smoothed ngram model based on the above tokens.
Task : what is the next word that is more likely to appear after This is
Task : compare the probability of this word from task after and before smoothing. Is the probability increased or decreased after smoothing?
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
