Question: Please help with steps (g) Let us assume that a repository X has a vocabulary size of 200,000 word and comprises 2,000,000 documents, each of

 Please help with steps (g) Let us assume that a repository

Please help with steps

(g) Let us assume that a repository X has a vocabulary size of 200,000 word and comprises 2,000,000 documents, each of which has a vocabulary of approx. 500 words. What type of index would you not recommend and why? [3 marks] (h) The table below shows a binary term-document incidence matrix for a sample data collection: D3 D4 D6 Gel Animal Spring 1 1 D1 0 1 1 0 0 1 D2 1 0 1 0 1 1 1 1 0 1 8 SHOHHO D5 1 0 0 1 1 1 Car OPOO 1 Window Flower Answer the following questions, clearly showing your working: (1) Determine, using the above matrix, the documents that are relevant to the query Spring AND Flower". Write down the steps to arrive at your answer. [3 marks] (ii) Determine, using the above matrix, the documents that are relevant to the query Gel AND NOT Window. Write down the steps to arrive at your answer. [2 marks] (iii) Briefly explain why using the Cosine similarity is not well-suited to identify the most relevant documents based on binary term-document incidence matrices. [2 marks] (iv) Would the number of zero elements in the table increase or decrease if synonyms would be accounted for? Justify your answer. [2 marks]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!