Question: I have written a CKY parser, but the unit tests are failing with the wrong probabilities. I have tried the best _ prob variable as
I have written a CKY parser, but the unit tests are failing with the wrong probabilities. I have tried the "bestprob" variable as both a float and a decimal, but I am still getting the incorrect probabilities.
These are the instructions:
Create a function named ckyparsing that applies the CKY algorithm to parse sentences using a given Probabilistic ContextFree Grammar PCFG The function should handle unknown words by substituting them with and should employ the Viterbi parsing algorithm. Additionally, it should account for the grammar's productions to identify known words.
Function Definition:
Name the function ckyparsing.It should accept two parameters: sentences a list of sentences to be parsed and grammar the PCFG used for parsing
Preprocessing:
Within the function, construct a set of known words present in the given productions generate 'production' set from the grammar passed This set is used to determine if words in the sentences are covered by the grammar.
Viterbi Parser Setup:
Initialize a Viterbi parser using the provided PCFG
Sentence Processing:
Iterate over each sentence in sentences:
Tokenize the sentence.Replace any word not found in the set of known words with Parse the sentence using the Viterbi parser.Select the parse with the highest probability, or handle cases where no valid parse is found check the parseall method of the ViterbiParser object from the nltkparse library
Return Value
The function should return a list of tuples. One tuple per sentence processed. Each tuple should contain:
The index of the sentence within the input list.
The original sentence.
The best parse tree found, or an appropriate value indicating the grammatical structure of sentences.
from nltkgrammar import PCFG
from nltkparse import ViterbiParser
from decimal import Decimal
#Create a function named ckyparsing
def ckyparsingsentences grammar:
#construct a set of known words present in the given productions
knownwords set
for production in grammar.productions:
for rhs in production.rhs:
if isinstancerhs str: # check for word
knownwords.addrhs
# Initialize a Viterbi parser using the provided PCFG
viterbiparser ViterbiParsergrammar
iterresults
# Iterate over each sentence in sentences
for index, sentence in enumeratesentences:
# Tokenize the sentence
tokens nltkwordtokenizesentence
# Replace any word not found in the set of known words with
processtokens token if token in knownwords else for token in tokens
# Parse the sentence using the Viterbi parser
parsetrees viterbiparser.parseallprocesstokens
# Select the parse with the highest probability, or handle cases where no valid parse is found
bestparse None
bestprob Decimal
for tree in parsetrees:
prob tree.prob
if prob bestprob:
bestprob prob
bestparse tree
if bestparse is not None:
iterresults.appendindex sentence, bestparse #append best parse
else:
iterresults.appendindex sentence, None #append none
return iterresults
The unit tests are failing with:
Actual output : NNP 'Beach'
Expected output: NNP 'Beach'
Actual output : NP WDT NNS
Expected output: NP WDT NNS
Actual output : VBG 'leaving'
Expected output: VBG 'leaving'
Actual output : FRAG PP PP
Expected output: FRAG PP PP
Actual output : RBS 'least'
Expected output: RBS 'least'
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
