Question: I have written a CKY parser, but the unit tests are failing with the wrong probabilities. I have tried the best _ prob variable as

I have written a CKY parser, but the unit tests are failing with the wrong probabilities. I have tried the "best_prob" variable as both a float and a decimal, but I am still getting the incorrect probabilities.
These are the instructions:
Create a function named cky_parsing that applies the CKY algorithm to parse sentences using a given Probabilistic Context-Free Grammar (PCFG). The function should handle unknown words by substituting them with and should employ the Viterbi parsing algorithm. Additionally, it should account for the grammar's productions to identify known words.
1.Function Definition:
Name the function cky_parsing.It should accept two parameters: sentences (a list of sentences to be parsed) and grammar (the PCFG used for parsing).
2. Preprocessing:
Within the function, construct a set of known words present in the given productions (generate 'production' set from the grammar passed). This set is used to determine if words in the sentences are covered by the grammar.
3. Viterbi Parser Setup:
Initialize a Viterbi parser using the provided PCFG.
4. Sentence Processing:
Iterate over each sentence in sentences:
Tokenize the sentence.Replace any word not found in the set of known words with .Parse the sentence using the Viterbi parser.Select the parse with the highest probability, or handle cases where no valid parse is found (check the parse_all method of the ViterbiParser object from the nltk.parse library)
5. Return Value
The function should return a list of tuples. One tuple per sentence processed. Each tuple should contain:
The index of the sentence within the input list.
The original sentence.
The best parse tree found, or an appropriate value indicating the grammatical structure of sentences.
```from nltk.grammar import PCFG
from nltk.parse import ViterbiParser
from decimal import Decimal
#Create a function named cky_parsing
def cky_parsing(sentences, grammar):
#construct a set of known words present in the given productions
known_words = set()
for production in grammar.productions():
for rhs in production.rhs():
if isinstance(rhs, str): # check for word
known_words.add(rhs)
# Initialize a Viterbi parser using the provided PCFG
viterbi_parser = ViterbiParser(grammar)
iter_results =[]
# Iterate over each sentence in sentences
for index, sentence in enumerate(sentences):
# Tokenize the sentence
tokens = nltk.word_tokenize(sentence)
# Replace any word not found in the set of known words with
process_tokens =[token if token in known_words else '' for token in tokens]
# Parse the sentence using the Viterbi parser
parse_trees = viterbi_parser.parse_all(process_tokens)
# Select the parse with the highest probability, or handle cases where no valid parse is found
best_parse = None
best_prob = Decimal()
for tree in parse_trees:
prob = tree.prob()
if prob > best_prob:
best_prob = prob
best_parse = tree
if best_parse is not None:
iter_results.append((index, sentence, best_parse)) #append best parse
else:
iter_results.append((index, sentence, None)) #append none
return iter_results
```
The unit tests are failing with:
Actual output : NNP -> 'Beach' [0.016129]
Expected output: NNP -> 'Beach' [0.00735294]
Actual output : NP -> WDT NNS [0.0196078]
Expected output: NP -> WDT NNS [0.00144928]
Actual output : VBG -> 'leaving' [0.166667]
Expected output: VBG -> 'leaving' [0.264706]
Actual output : FRAG*-> PP PP [0.142857]
Expected output: FRAG*-> PP PP [0.416667]
Actual output : RBS -> 'least' [1.0]
Expected output: RBS -> 'least' [1.0]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!