Question: This section needs to be completed using Python 3.6+. You will also require following packages: pandas numpy

This section needs to be completed using Python 3.6+. You will also require following packages: 

    • pandas 

    • numpy

    • NLTK or SpaCy 

    • scikit-learn 

    • random 

 

Q1. Text Generation using N-grams: Code [15] 

 

1. Download the .txt file of the book "The Great Gatsby" from the Gutenberg Project1 . Write a program to read the book and preprocess it by first applying sentence tokenizer2 and then word tokenizer3 . Make sure to remove all the punctuations and stopwords. 

 

2. Add a token at the start and at the end of each sentence and then generate a dictionary containing the bigrams and their frequencies. The sentence 'I love nlp' will become ' I love nlp ' so the bigrams for this sentence will be ' I', 'I love', 'love nlp', 'nlp '.

 

3. With the following formula, calculate the conditional probability of each word given the previous word. 

                                ????(???????? |????????−1) = ????????????????????(????????−1, ????????)/????????????????????(????????−1) 

 

4. Using 'He' as the first token, generate the next 5 words as follows: 

      • For word ????????−1 get the probability of all other words ???????? given the word ????????−1, and make a list of the first 10 words with the highest probability. 

      • Use method random.choice4 on the generated list to get a random word with high probability. 

      • Continue the process till you generate the next 5 words or encounter a '' token. 

 

5. With the perplexity metric as defined below, evaluate the performance of the model for the generated sequence obtained from the previous step. 

                                 ????????(????) = ???? √ ???? ∏ ????=1 1 ????(???????? |????????−1) 

To avoid underflow, use log space to calculate the perplexity metric. 

                               log ????????(????) = − 1 ???? ???? ∑ ????=1 log ????(???????? |????????−1) 1

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Algorithms Questions!