Question: This exercise is based on the course assignment. Consider the following document collection D={D1,D2,D3} (given as one document per line): D1SillySallySleepySallyD2SevenSillySheepD3SillySheepShouldSleepSilly Assume that the stopword

 This exercise is based on the course assignment. Consider the following

This exercise is based on the course assignment. Consider the following document collection D={D1,D2,D3} (given as one document per line): D1SillySallySleepySallyD2SevenSillySheepD3SillySheepShouldSleepSilly Assume that the stopword list contains the word Should, and words are stemmed (that is, converted to their root). - Show the dictionary and the postings list including all the relevant statistics computed, such as raw tf-idf values shown explicitly as '(tf,idf)' with each document in the postings list), for implementing (uncompressed) inverted index structure for Vector Space Ranked Retrieval in an easy-to-read format. Assume that raw term frequency factor is the count of the number of term occurrences in a document (rather than the normalized, log-dampened value) and the inverse document frequency factor is the reciprocal of the fraction of documents that contain the term (rather than its logarithm). - What are the relevance scores and the ranking of the documents for the query: Siliy? - Does the ranking change if we define term frequency factor as the normalized fraction of the term occurrences in a document (rather than the raw count)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!