Question: [35] This exercise and the next one are about web similarity. Let X be a search term (web query) with d(X) 3. (a) Show

• [35] This exercise and the next one are about web similarity.

Let X be a search term (web query) with d(X) ≥ 3.

(a) Show that the common similarity (common semantics) NWD(X)

can not be derived in general from the pairwise NWD’s of pairs of members of X.

(b) Show that classification using NWD(X) gives in general different results from classification using only NWD(Y )’s with every Y ⊂ X and d(Y ) = 2.

(c) Show that the computational complexity to compute NWD(X) is far less than computing with NWD(Y )’s with d(Y ) = 2 for every Y ⊂ X.

(d) Show that classification using NWD(X) has far less computational complexity in general than classification using only NWD(Y )’s with every Y ⊂ X and d(Y ) = 2.

Comments. Source (also for the next exercise): [A.R. Cohen and P.M.B.

Vit´anyi, arXiv:1502.05957 [cs.IR]]. Hint for Item (a): Let f(x) = f(y) =

f(z) = N1/4 be the cardinalities of the sets of web pages containing occurrences of the term x, the term y, and the term z, respectively.
The quantity N is the total number of webpages, f(x, y) = f(x, z) = f(y, z) = N1/8 and f(x, y, z) = N1/16. Here f(x, y) is the number of pages containing both terms x and y, and so on. Computing the NGD’s gives NGD(x, y) = NGD(x, z) = NGD(y, z)=1/6. Using either the minimum NGD, the maximum NGD, or the average NGD, will always give the value 1/6. Using the NWD as in Equation 8.24 we find NWD({x, y, z})=1/8. Hint for Item (b): The class is selected where the absolute difference in common similarity with and without inserting the new item is minimal. If more than one class is selected we choose a class with maximal common similarity. The frequencies of x, y, z and the pairs (x, y),(x, z),(y, z) are as above. For the terms u, v and the pairs (u, v),(u, z),(v, z) the frequencies are f(u) = f(v) = N1/4 and f(u, v) = f(u, z) = f(v, z) = N1/9. Suppose we classify the term z into classes A = {x, y} and B = {u, v} using a computation with the NGD’s. Then the class B will be selected. Namely, the insertion of z in class A will induce new NGD’s with all exactly having the values of 1/6 (as above). Since NGD(u, v) = NGD(u, z) = NGD(v, z)=5/36 insertion of z into the class B = {u, v} will give the NGD’s of all resulting pairs (u, v),(u, z),(v, z) values of 5/36. The choice being between classes A and B we see that in neither class the common similarity according to the NGD’s is changed. Therefore, we select the class where all NGD’s are least (that is, the most common similarity), which is B = {u, v}. Next we select according to the NWD. Assume f(u, v, z) = N1/10. Then NWD(u, v, z)=1/4. Then NWD({u, v, z}) − NWD({u, v})(= NGD(u, v)) = 1/4 − 5/36 = 4/36.
Since NWD({x, y, z}) − NWD({x, y})(= NGD(x, y)) = 1/8 − 1/6 = −1/24 and selection according to the NWD chooses the least absolute difference we select class A = {x, y}. Hint for Items

(c) and (d): count the NWD computations involved.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Elementary Probability For Applications Questions!