Question: In this question, you will learn to implement the Abstractive Summarization task using the encoder - decoder model. 1 . Load the dataset cnn -
In this question, you will learn to implement the Abstractive Summarization task using the encoderdecoder model.
Load the dataset cnndailymail Use version Use train set for training and test set for testing purposes. Use columns article which is the news article published in CNN and Daily Mail, and highlights which is the summary of the given news article.
Preprocess the article column and the highlights column with NLTK The preprocessing steps include converting the text to lower case, removing special characters and punctuations. Add and tokens at the start and the end of each row of the highlights text.
Tokenize the text in both columns and build a separate vocabulary for each column. Create the wordtoindex and indextoword dictionaries for article column and convert the text in article column to the index vector.
You should use an encoderdecoder architecture to generate the text. The encoder should be a RNN eg LSTM GRU, etc. and the decoder should be a fully connected dense layer. Note that the initial hidden states for the decoder will be the output hidden states of the encoder model. Also, the input sequence to the decoder will have the same length that of the target sequence starting with In the inference stage, the process will run till token is encountered.
Train the model with crossentropy loss. On the test set, generate the summary using beam search. Your model should be able to generate text of at least words.
Use the rouge module in PyTorch calculate and report the average ROUGE ROUGE and ROUGEL scores for the test set.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
