Question: Q 4 . Extractive Summarization: Code [ 1 5 ] In this question, you will implement the extractive summarization using the PageRank algorithm. 1 .

Q4. Extractive Summarization: Code [15]
In this question, you will implement the extractive summarization using the PageRank algorithm.
1. Load the dataset BBC News Summary \({}^{6}\). Use the category business for the rest of the task. In the dataset, each news article and its summary is provided in a .txt file. Load the files and create a DataFrame with columns article and summary. Each summary is five sentences long.
2. Perform sentence tokenization on each row of the text column. Preprocess the text similar to step 2 of question 1. You do not need to add start and end tokens for this question.
3. Download and load the GloVe embeddings \({}^{7}\). Use Wikipedia 2014+ Gigaword 5 GloVe vectors.
4. Find the average embedding vector of each sentence in each row by word tokenizing the sentences and taking the mean of all word embeddings extracted from GloVe.
5. Construct a similarity matrix by finding pairwise cosine similarity of the sentences of each row. With the use of networkx \({}^{8}\) library, create a graph using the similarity matrix and find the rank of each sentence \({}^{9}\).
6. Based on the rank, extract the top 5 sentences with the highest rank as the summary.
7. Calculate the average ROUGE-1, ROUGE-2 and ROUGE-L scores for the test set.
Q 4 . Extractive Summarization: Code [ 1 5 ] In

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!