Complete each of the following tasks. 1. Load NLTK Treebank tagged sentences using nltk.corpus.treebank.tagged_sents(). Use first 80%
Question:
Complete each of the following tasks.
1. Load NLTK Treebank tagged sentences using nltk.corpus.treebank.tagged_sents(). Use first 80% of sentences for training and the remaining 20% for testing.
2. Extract the word and the tag from each of the sentences and create a vocabulary of all the words and a set of all tags.
3. To implement the Viterbi algorithm, you need 2 components, Tag transition probability matrix A: It represents the probability of a tag occurring given the previous tag or ( |1). We compute the maximum likelihood estimate (MLE) of the probability by counting the occurrences of the tag 1 followed by tag . ( |1) = (1, ) (1) Emission probability matrix B: It represents the probability of a tag being associated with a given word or ( |). MLE estimate is: ( |) = ( , ) () Since the number of tags is smaller, creating matrix A is time efficient whereas generation of matrix B will be very expensive due to vocabulary size.
4. Implement a method compute_tag_trans_probs() to calculate matrix A by parsing the sentences in the training set and counting the occurrences of the tag 1 followed by .
5. Implement a method emission_probs() to calculate emission probability of a given word having a tag .
6. Next step in HMM is decoding which entails determining the hidden variable sequence of observations. In POS tagging, decoding is to choose the sequence of tags most probable to the sequence of words. We compute this using the following equation, 1 = argmax 1... =1 ( |)( |1) The optimal solution for HMM decoding is given by the Viterbi algorithm, a dynamic approach to the computation of the decoded tags. Implement the algorithm using the two methods, compute_tag_trans_probs() and emission_probs() implemented above and return the sequence of tags corresponding to the given sequence of words. Refer to section 8.4.5, Fig. 8.10 of Speech and Language Processing book5 .
7. Evaluate the performance of the model in terms of accuracy on the test set.
Operations Management in the Supply Chain Decisions and Cases
ISBN: 978-0073525242
6th edition
Authors: Roger Schroeder, M. Johnny Rungtusanatham, Susan Goldstein