Question: 4. (20 points) Consider the following sample document collection: D1 = (2, 4, 1, 9, 2, 0) D2 = (1, 1, 2, 1, 0,4) D3

 4. (20 points) Consider the following sample document collection: D1 =

4. (20 points) Consider the following sample document collection: D1 = (2, 4, 1, 9, 2, 0) D2 = (1, 1, 2, 1, 0,4) D3 = (7, 2, 5, 0, 1, 0) D4 = (0, 1, 2, 6, 1, 2) D5 =(3, 0, 1, 4, 2, 1) D6 = (1,6, 0, 2, 6, 2) D7 = (2, 6, 3, 2, 8, 1) 1.) Using the following similarity calculation expression to calculate the similarities between documents. SIM(DOCK, DOCH) = { TERMik x TERMih i=1 2). Set the threshold to 45 to group the documents into clusters. 3). Calculate the centroid for each group by the following expression: CTERME = 1/m [TERMik i=1 4). Match the given a query Q = (1, 0, 5, 7, 4, 4) to find the documents that is most similar to the query

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!