Question: Please help with steps (g) Let us assume that a repository X has a vocabulary size of 200,000 word and comprises 2,000,000 documents, each of

Please help with steps
(g) Let us assume that a repository X has a vocabulary size of 200,000 word and comprises 2,000,000 documents, each of which has a vocabulary of approx. 500 words. What type of index would you not recommend and why? [3 marks] (h) The table below shows a binary term-document incidence matrix for a sample data collection: D3 D4 D6 Gel Animal Spring 1 1 D1 0 1 1 0 0 1 D2 1 0 1 0 1 1 1 1 0 1 8 SHOHHO D5 1 0 0 1 1 1 Car OPOO 1 Window Flower Answer the following questions, clearly showing your working: (1) Determine, using the above matrix, the documents that are relevant to the query Spring AND Flower". Write down the steps to arrive at your answer. [3 marks] (ii) Determine, using the above matrix, the documents that are relevant to the query Gel AND NOT Window. Write down the steps to arrive at your answer. [2 marks] (iii) Briefly explain why using the Cosine similarity is not well-suited to identify the most relevant documents based on binary term-document incidence matrices. [2 marks] (iv) Would the number of zero elements in the table increase or decrease if synonyms would be accounted for? Justify your answer. [2 marks]
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
