Question: Python I am trying to build a bigram model and to calculate the probability of word occurrence . I should: Select an appropriate data structure

Python

I am trying to build a bigram model and to calculate the probability of word occurrence

. I should: Select an appropriate data structure to store bigrams. Increment counts for a combination of word and previous word. This means I need to keep track of what the previous word was. Compute the probability of the current word based on the previous word count.

Prob of curr word = count(prev word, curr word) / count(previous word)

Consider we observed the following word sequences:

finger remarked

finger on

finger on

finger in

finger .

Notice that "finger on " was observed twice. Also, notice that the period is treated as a separate word. Given the information in this data structure, we can compute the probability (on|finger) as 2/5 = 0.4.

Here is what I got so far:

filename = 'blah-blah.txt'

bigrams ={}

unigrams = {}

prev_word = "START"

# opening the filename in read mode

for line in fp:

words = line.split() for word in words: word = word.lower() bigram = prev_word + ' ' + word #print(bigram) if word in unigrams: unigrams[word] +=1 else: unigrams[word] =1 #print(unigrams[word]) if bigram in bigrams: bigrams[bigram] += 1 else: bigrams[bigram] = 1 prev_word = word

output_file = 'bigram_probs.txt' with open(output_file, "w") as fs: for key, value in sorted(bigrams.items()): prob = value / unigrams[word] fs.write(key + ": " + str(prob) + " ")

My program works, but I am not sure if it does what it should do. I appreciate any help!

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!