Question: For the generic text summarization, given below 5 documents in the collection as per the chronological order of mentions, answer the following questions sequentially. a

For the generic text summarization, given below 5 documents in the collection as per the chronological order of mentions, answer the following questions sequentially.
a) Preprocess the training document to remove the below set of tokens. Note: Treat small case vs capital case as same but retain all the words.
Stop words & punctuations: {the, and, are, of, in,'.',','}
b) Compute the saliency score of every sentence using the sum of weights of all the words in the above preprocessed sentences. Use only the new simplified formula to compute words weightage. A word is considered as salient only if the calculated weight is greater than value 2.
Weight (word Wi)= N /( n k * w k )
Where,
N = Number of document given in the collection
nk = Total number of words(including the duplicates) in the preprocessed document collection.
wk = Frequency of word i in the preprocessed document collection
c) Extract summary in reverse chronological order with the top 2 most informative documents from part b)s result.
d) Use ROUGE-1 to evaluate the results of the above system summary obtained in part b) w.r.t the below reference summary.
Ali baba and the forty thieves is the treasure of fairy tales

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!