Question: 1. Let's look at the first-order Markov language model (Bigram language model). We will train the model with a maximum likelihood estimation, but before doing

1. Let's look at the first-order Markov language model (Bigram language model). We will

train the model with a maximum likelihood estimation, but before doing so we will remove

from the corpora the STOP marking that ends a sentence, so that the sentences will end with

any word and not necessarily a STOP. Show that in such a case the sum of the probabilities

that the estimated language model gives for all the strings of finite length is greater than 1

(that is, the estimated model is not a valid language model).

2. Give an example of two sentences in English, one is grammatically correct and the other

is not grammatically correct, such that a second-order Markov language model (trigram) will

give a high probability to the grammatically incorrect sentence, or a low probability to the

grammatically correct sentence.

3. Suppose now that we are given a training corpus with syntactic trees (syntactically parsed

training corpus). The trees are dependency trees. Suggest a way to define a new language

model that addresses the problem you presented in the previous section. The new model

must succeed in maintaining a reasonable perplexity (similar to that of the trigram model)

and give to the pair of examples given in the previous section, logical probabilities, that is, a

high probability for the grammatically correct sentence and a low probability for the nongrammatically

correct sentence. Explain why the model you proposed is likely to meet these

features.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!