Question: Training Data We'll be using the English news Corpus ( 2 0 1 8 year ) as our training data. There are around 1 0

Training Data
We'll be using the English news Corpus (2018 year) as our training data. There are around 10,000 sentences. eng_news_2018_10K-sentences.txtDownload eng_news_2018_10K-sentences.txt
Problem: N-gram language model
You need to build 2 language models, the Unigram model, and the Bigram model using Laplace smoothing. With each model, you will do the following tasks:
Display 10 generated sentences from this model.
Score the probabilities of the provided test sentences and display the average and standard deviance of these sentences.
once for the provided test set
once for the test set that you curate
Part 1: Build an n-gram language model (6 marks)
Pre-processing of the data.
Split the data for training and testing.
You need to develop an n-gram model that could model any order n-gram, which we'll be using specifically to look at unigrams and bigrams. Specifically, you'll write code that builds this language model from the training data and provides functions that can take a sentence in (formatted the same as in the training data) and return the probability assigned to that sentence by your model.
Handling of unknown words and smoothing.
Evaluating the language model.
Part 2: Implement Sentence Generation (4 points)
In this part, youll implement sentence generation for your Language Model. Start by generating the token, then sampling from the n-grams beginning with . Stop generating words when you hit an token.
Notes:
When generating sentences for unigrams, do not count the pseudo-word as part of the unigram probability mass after you've chosen it as the beginning token in a sentence.
All unigram sentences that you generate should start with one and end with one
For n-grams larger than 1, the sentences you generate should start with n1 tokens. They should end with n1 tokens.
Justification of the output obtained for all the above tasks is mandatory

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!