Question: Q 4 . Extractive Summarization: Code [ 1 5 ] In this question, you will implement the extractive summarization using the PageRank algorithm. 1 .
Q Extractive Summarization: Code
In this question, you will implement the extractive summarization using the PageRank algorithm.
Load the dataset BBC News Summary Use the category business for the rest of the task. In the dataset, each news article and its summary is provided in a txt file. Load the files and create a DataFrame with columns article and summary. Each summary is five sentences long.
Perform sentence tokenization on each row of the text column. Preprocess the text similar to step of question You do not need to add start and end tokens for this question.
Download and load the GloVe embeddings Use Wikipedia Gigaword GloVe vectors.
Find the average embedding vector of each sentence in each row by word tokenizing the sentences and taking the mean of all word embeddings extracted from GloVe.
Construct a similarity matrix by finding pairwise cosine similarity of the sentences of each row. With the use of networkx library, create a graph using the similarity matrix and find the rank of each sentence
Based on the rank, extract the top sentences with the highest rank as the summary.
Calculate the average ROUGE ROUGE and ROUGEL scores for the test set.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
