Question: You will learn to build Hidden Markov Model using the Viterbi algorithm and apply it to the task of POS tagging. Complete each of the

You will learn to build Hidden Markov Model using the Viterbi algorithm and apply it to the
task of POS tagging. Complete each of the following tasks.
Load NLTK Treebank tagged sentences using nltk.corpus.treebank.tagged_sents().
Use first 80% of sentences for training and the remaining 20% for testing.
Extract the word and the tag from each of the sentences and create a vocabulary of all
the words and a set of all tags.
To implement the Viterbi algorithm, you need 2 components,
Tag transition probability matrix A: It represents the probability of a tag
occurring given the previous tag or p(ti|ti-1). We compute the maximum likelihood
estimate (MLE) of the probability by counting the occurrences of the tag ti-1 followed
by tagti.
p(ti|ti-1)=count(ti-1,ti)count(ti-1)
Emission probability matrix B: It represents the probability of a tag ti being
associated with a given word wi or p(wi|ti). MLE estimate is:
p(wi|ti)=count(ti,wi)count(ti)
Since the number of tags is smaller, creating matrix A is time efficient whereas generation
of matrix B will be very expensive due to vocabulary size.
Implement a method compute_tag_trans_probs () to calculate matrix A by parsing the
sentences in the training set and counting the occurrences of the tag ti-1 followed by ti.
Implement a method emission_probs () to calculate emission probability of a given word
wi having a tag ti.
Next step in HMM is decoding which entails determining the hidden variable sequence of
observations. In POS tagging, decoding is to choose the sequence of tags most probable
to the sequence of words. We compute this using the following equation,
hat(t)1:n=argmaxt1dotstnprodi=1np(wi|ti)p(ti|ti-1)
The optimal solution for HMM decoding is given by the Viterbi algorithm, a dynamic
approach to the computation of the decoded tags. Implement the algorithm using the
two methods, compute_tag_trans_probs() and emission_probs() implemented above
and return the sequence of tags corresponding to the given sequence of words. Refer to
section 8.4.5, Fig. 8.10 of Speech and Language Processing book ?5.
Evaluate the performance of the model in terms of accuracy on the test set.
 You will learn to build Hidden Markov Model using the Viterbi

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!