Question: 5. Distinct Elements Analysis (9 points) YouTube wants to count the number of distinct views for a video, but doesn't want to store all the

5. Distinct Elements Analysis (9 points)
YouTube wants to count the number of distinct views for a video, but doesn't want to store all the user ID's.
3
We modelled the problem as follows: we see a stream of 8-byte integers (user ID's), 21, 22, ..., xn, where x; is the user ID of the i-th view to a video, but there are only n distinct elements (1
(a) [5 Points] Let U1,..., Um be m iid samples from the continuous Unif(0, 1) distribution, and let X = min{U1,..., Um}. We know from lecture (and the textbook) that E[X] m+1 Compute Var[X].
(b) [2 Points] If we were to solve the Distinct Elements problem naively using a Set, what would the big-Oh space complexity be in terms of N and/or n)? If a video had N = 2 billion views, with only n = 900 million of them being distinct views, how much storage would we need for this one video to keep track of the distinct users? Your answer should be in bytes.
(c) [2 Points] Determine how much space is saved from using the MultDistElts class if YouTube wants to use K = 10,000 hash functions compared to using the Set in part (b)? Assume for simplicity MultDistElts only stores K = 10, 000 floats val; (each float is 8 bytes, and corresponds to a certain DistElts class). What is the multiplicative savings factor (e.g., 10x, 2x, etc)?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!