Question: Problem 2 (10 points) (Exercise 6.3.4 MMDS book) Suppose we perform the PCY Algo- rithm to find frequent pairs, with market-basket data meeting the following

 Problem 2 (10 points) (Exercise 6.3.4 MMDS book) Suppose we perform

Problem 2 (10 points) (Exercise 6.3.4 MMDS book) Suppose we perform the PCY Algo- rithm to find frequent pairs, with market-basket data meeting the following specifications: 1. The support threshold is 10,000. 2. There are one million items, represented by the integers 0, 1, 999999 3. There are 250, 000 frequent items, that is, items that occur 10,000 times or more. 4 Thare are oae millipstur 10,00 tines or ore 5. There are P pairs that occur exactly once and consist of two frequent items 6. No other pairs occur at all. 7. Integers are always represented by 4 bytes. 8. When we hash pairs, they distribute among buckets randomly, but as evenly as possible i.e., you may assume that each bucket gets exactly its fair share of the P pairs that occur once Suppose there are S bytes of main memory. In order to run the PCY Algorithm successfully, the number of buckets must be sufficiently large that most buckets are not frequent. In addition, on the second pass, there must be enough room to count all the candidate pairs. As a function of S, what is the largest value of P for which we can successfully run the PCY Algorithm on this data

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!