Question: Implement the CYK Parser in Java (see algorithm below) for the context-free grammar and lexicon showed below: Algorithm: We assume an enumerated class NonTerms with

Implement the CYK Parser in Java (see algorithm below) for the context-free grammar and lexicon showed below:

Algorithm:

We assume an enumerated class NonTerms with all the non-terminals, both phrase structure symbols and parts of speech.

A node of the parse tree is an object with the following data fields (in Java-esque notation)

class Tree { NonTerm phrase % The Non-terminal int startPhrase, int endPhrase; % indices of starting and ending word String word; % If a leaf, then the word Tree left; Tree right; double prob; } 

The fields left, right, and prob are the left-hand child, right-hand child and probability in the best parse found so far for this particular non-terminal from this start to this end (e.g. the best parse for an NP from word 3 to word 6). Note that the probability on a node is _not_ the probability of this parse tree given the phrase, it is the probability of the parse tree including the choice of word given the non-terminal at the root. However for any particular non-terminal and choice of sentence those two probabilities are proportional, by Bayes' rule, which we will come to later in the course.

The main data structure in the procedure is a chart which is an array P[M,I,J]. M is indexed on the non-terminal, I and J go from 1 to N (the length of the sentence). P[M,I,J] is the node with NonTerm==M, startPhrase==I, and endPhrase==J. Note that this uses 1-based indexing, like the textbook.

function CYK-PARSE(sentence,grammar) return P, a chart. { 1. N = length(sentence); 2. for (i = 1 to N) { 3. word = sentence[i]; 4. for (each rule "POS --> Word [prob]" in the grammar) 5. P[POS,i,i] = new Tree(POS,i,i,word,null,null,prob); 6. } % endfor line 2. 7. for (length = 2 to N) % length = length of phrase 8. for (i = 1 to N+1-length) { % i == start of phrase 9. j = i+length-1; % j == end of phrase 10. for (each NonTerm M) { 11. P[M,i,j] = new Tree(M,i,j,null,null,null,0.0); 12. for (k = i to j-1) % k = end of first subphrase 13. for (each rule "M -> Y,Z [prob]" in the grammar) { 14. newProb = P[Y,i,k].prob * P[Z,k+1,j].prob * prob; 15. if (newProb > P[M,i,j].prob) { 16. P[M,i,j].left = P[Y,i,k]; 17. P[M,i,j].right = P[Z,k+1,j]; 18. P[M,i,j].prob = newProb; 19. } % endif line 15 20. } % endfor line 13 21. } % endfor line 10 22. } % endfor line 8 23. return P; 24. } % end CYK-PARSE. 

The value returned in P["S",1,N] is the root of the most probable parse.

The most probable parse tree can be printed out (not very elegantly) using pre-order search.

printTree(P) { % P is the chart returned by CYK-PARSE above. printTree1(P[S,1,N],0) % S is the top-level nonterminal in the grammar. } % N is the length of the sentence. printTree1(Tree tree, int INDENT) { if (tree != null) { print out INDENT spaces; print out tree.phrase if (tree.word != null) print out tree.word; printTree(tree.left, INDENT+3); printTree(tree.right, INDENT+3); }

Grammar

S -> Noun Verb [0.2] S -> Noun VerbAndObject [0.3] S -> Noun VPWithPPList [0.1] S -> NP Verb [0.2] S -> NP VerbAndObject [0.1] S -> NP VPWithPPList [0.1] NP -> Noun PP [0.8] NP -> Noun PPList [0.2] PP -> Prep Noun [0.6] PP -> Prep NP [0.4] PPList -> PP PP [0.6] PPList -> PP PPList [0.4] VerbAndObject -> Verb Noun [0.5] VerbAndObject -> Verb NP [0.5] VPWithPPList -> Verb PP [0.3] VPWithPPList -> Verb PPList [0.1] VPWithPPList -> VerbAndObject PP [0.4] VPWithPPList -> VerbAndObject PPList [0.2]

Noun -> amy [0.1] Noun -> dinner [0.2] Noun -> fish [0.2] Noun -> streams [0.1] Noun -> swim [0.2] Noun -> tuesday [0.2] Prep -> for [0.5] Prep -> in [0.3] Prep -> on [0.2] Verb -> ate [0.7] Verb -> streams [0.1] Verb -> swim [0.2]

The program should read a single sentence and print out the most probable parse tree with its probability. For instance given the input "Fish swim in streams" the program should print out

S Noun fish VPWithPPList Verb swim PP Prep in Noun streams Probability = 0.0000216 (2.16e-5). 

If the input sentence cannot be parsed as a sentence e.g. "toy in" then the program should print out "This sentence cannot be parsed".

You may assume that the sentence is correctly formatted; that is, it consists of words in the lexicon in lower case separated by spaces.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!