Question: Q 4 . Extractive Summarization: Code [ 1 5 ] In this question, you will implement the extractive summarization using the PageRank algorithm. 1 .

4 .

Extractive Summarization: Code

[15]

In this question, you will implement the extractive summarization using the PageRank algorithm.

1 .

Load the dataset BBC News Summary

\ ({}^{6} \) .

Use the category business for the rest of the task. In the dataset, each news article and its summary is provided in a

.

txt file. Load the files and create a DataFrame with columns article and summary. Each summary is five sentences long.

2 .

Perform sentence tokenization on each row of the text column. Preprocess the text similar to step

2

of question

1 .

You do not need to add start and end tokens for this question.

3 .

Download and load the GloVe embeddings

\ ({}^{7} \) .

Use Wikipedia

2014 +

Gigaword

5

GloVe vectors.

4 .

Find the average embedding vector of each sentence in each row by word tokenizing the sentences and taking the mean of all word embeddings extracted from GloVe.

5 .

Construct a similarity matrix by finding pairwise cosine similarity of the sentences of each row. With the use of networkx

\ ({}^{8} \)

library, create a graph using the similarity matrix and find the rank of each sentence

\ ({}^{9} \) .

6 .

Based on the rank, extract the top

5

sentences with the highest rank as the summary.

7 .

Calculate the average ROUGE

- 1,

ROUGE

- 2

and ROUGE

-

L scores for the test set.

Q 4 . Extractive Summarization: Code [ 1 5 ] In

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Q 2 . Abstractive Summarization using T 5 : Code [ 5 ] In this question, you will implement the summarization task using the pre - trained model T 5 . 1 . Load the dataset cnn - dailymail with 3 . 0...

Assume that you have an algorithm that can fill 3D triangles with a constant colour. Explain what additional information and additions to the algorithm are required to Gouraud shade the triangles....

(a) In SystemVerilog, what is the difference between: (i) The ternary operator ? and if...then...else statements? [2 marks] (ii) always_ff and always_comb? [2 marks] (iii) Blocking, non-blocking and...

mw Assumption Maximization (EM) (25 focuses) In this question you will carry out the EM calculation for Gaussian Mixture Models. A decent perused on gaussian combination EM can be found at this...

can someone solve this Modern workstations typically have memory systems that incorporate two or three levels of caching. Explain why they are designed like this. [4 marks] In order to investigate...

Answer question "Google in China" using image provided G C HB HB G X + X canvas.biola.edu/groups/39504/discussion_topics/545276 Update : Apps YouTube Netflix M Inbox (651) - devon... My Tasks - My...

Please help me with this assignment, 100% human! Reference book George, J. M. (2024). Contemporary management (12th ed.). McGraw-Hill Education. keiser library Syahbinah, S., & Suhardianto, N....

Assignment: Part 1 : ( 3 pages ) - Take a set of 1 0 documents - Classify these documents using TFIDF + supervised learning - Use Bayesian to detect likelihood of usefulness - Use bayesian to...

shell scripting solve as quickly i want to give you upvote Shell Scripting Project - Text Summarization using Sentence Centrality Extractive summarization works by choosing a subset of sentences from...

Paragrah - 1 Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group ( called a cluster ) are more similar ( in some sense ) to each other...

Matsushita is a manufacturing company. Do you think that the principles discussed in the case are as important for a service enterprise?

In Figure two radio-frequency point sources S1 and S2, separated by distance d = 2.0m, are radiating in phase with = 0.50 m. A detector moves in a large circular path around the two sources in a...

The risk - free rate is 1 % and the dividend yield on the S&P 5 0 0 index is 2 % . Which of the following is correct when a futures option on the index is being valued? Group of answer choices The...

When the Incoterm is Exworks, the buyers nominated freight forwarder must arrange for the loading of the goods into the truck at the factory. In practice, the seller assists in this task. True or F