This section needs to be completed using Python 3.6+. You will also require following packages:

Fantastic news! We've Found the answer you've been seeking!

Question:

• pandas

• numpy

• NLTK or SpaCy

• scikit-learn

• random

Q1. Text Generation using N-grams: Code [15]

1. Download the .txt file of the book "The Great Gatsby" from the Gutenberg Project1 . Write a program to read the book and preprocess it by first applying sentence tokenizer2 and then word tokenizer3 . Make sure to remove all the punctuations and stopwords.

2. Add a token at the start and at the end of each sentence and then generate a dictionary containing the bigrams and their frequencies. The sentence 'I love nlp' will become ' I love nlp ' so the bigrams for this sentence will be ' I', 'I love', 'love nlp', 'nlp '.

3. With the following formula, calculate the conditional probability of each word given the previous word.

????(???????? |????????−1) = ????????????????????(????????−1, ????????)/????????????????????(????????−1)

4. Using 'He' as the first token, generate the next 5 words as follows:

• For word ????????−1 get the probability of all other words ???????? given the word ????????−1, and make a list of the first 10 words with the highest probability.

• Use method random.choice4 on the generated list to get a random word with high probability.

• Continue the process till you generate the next 5 words or encounter a '' token.

5. With the perplexity metric as defined below, evaluate the performance of the model for the generated sequence obtained from the previous step.

????????(????) = ???? √ ???? ∏ ????=1 1 ????(???????? |????????−1)

To avoid underflow, use log space to calculate the perplexity metric.

log ????????(????) = − 1 ???? ???? ∑ ????=1 log ????(???????? |????????−1) 1

Related Book For answer-question

Income Tax Fundamentals 2013

ISBN: 9781285586618

31st Edition

Authors: Gerald E. Whittenburg, Martha Altus Buller, Steven L Gill

See More Books

Posted Date: Mar 13, 2024 08:24 AM

See More Questions

This section needs to be completed using Python 3.6+. You will also require following packages:

Question:

Expert Answer:

Income Tax Fundamentals 2013

Students also viewed these algorithms questions