Question: PYTHON I am working on next problem. Consider the following sentences written in Klingon. For each sentence, the part of speech of each word has
PYTHON
I am working on next problem.
Consider the following sentences written in Klingon. For each sentence, the part of speech of each word has been given (for ease of translation, some prefixes/suffixes have been treated as words), along with a translation. Using these training sentences, were going tobuild a Hidden Markov Model(HMM)to predict the part of speech of an unknown sentence using the Viterbi algorithm.
N PRO V N PRO
paDaq ghah taH terangan e
room (inside) he is human of
The human is in the room
V N V N
jachuqmeH rojHom neH terangan
in orderto parley truce want human
The enemy commander wants a truce in order to parley
N V N CONJ N V N
tera ngan qIp puq eg puq qIp terangan
human bit child and child bit child
The child bit the human, and the human bit the child
Step 1: Creating the Emission probability table(emission.javaor emission.py)Create a Emission probability table by computingthe frequencies of each part of speech in thetable below for all POS tags. Well use a smoothing factor of 0.1 (as discussed in class) to make sure that no event is impossible; add this number to all of your observations. Sample table valuesof two parts of speechhave been shown.Probability(word|tag) = Count(word,tag) / Count(tag)
and here is what I got for this part:
words1 = "paDaq ghah taH terangan e".replace("","'").split()
tags1 = "N PRO V N PRO".split()
words2 = "jachuqmeH rojHom neH terangan".replace("","'").split()
tags2 = "V N V N".split()
words3 = "terangan qIp puq eg puq qIp terangan".replace("","'").split()
tags3 = "N V N CONJ N V N".split()
train = []
train.append(zip(words1, tags1))
train.append(zip(words2, tags2))
train.append(zip(words3, tags3))
from collections import defaultdict
new_dict = defaultdict(list)
#print(new_dict)
for sent in train:
for word, tag in sent:
new_dict[word].append(tag)
#print(new_dict)
for word, tags in sorted(new_dict.items()):
row = []
row.append(word)
#print(row)
for tag in ["N", "V", "CONJ", "PRO"]:
row.append(tags.count(tag)+0.1)
There is next step: Creating the Transition probability table (transition.py) Generate a transition probability table by calculating the transition frequencies from one POS tag to another. Now, for each part of speech, total the number of times it transitioned to each other part of speech. Again, use a smoothing factor of 0.1. After youve done this, compute the start and transition probabilities. Sample table values of transition for two parts of speech have been shown.
Probability(tagi|tagi-1) = Count(tagi-1, tagi) / Count(tagi-1)
May someone help me here?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
