Question: For the generic text summarization, given below 5 documents in the collection as per the chronological order of mentions, answer the following questions sequentially. a
For the generic text summarization, given below documents in the collection as per the chronological order of mentions, answer the following questions sequentially.
a Preprocess the training document to remove the below set of tokens. Note: Treat small case vs capital case as same but retain all the words.
Stop words & punctuations: the and, are, of in
b Compute the saliency score of every sentence using the sum of weights of all the words in the above preprocessed sentences. Use only the new simplified formula to compute words weightage. A word is considered as salient only if the calculated weight is greater than value
Weight word Wi N n k w k
Where,
N Number of document given in the collection
nk Total number of wordsincluding the duplicates in the preprocessed document collection.
wk Frequency of word i in the preprocessed document collection
c Extract summary in reverse chronological order with the top most informative documents from part bs result.
d Use ROUGE to evaluate the results of the above system summary obtained in part b wrt the below reference summary.
Ali baba and the forty thieves is the treasure of fairy tales
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
