Question: 2 . Approximating the Median in a Data Stream ( 8 points ) Given a set S [ n ] of m distinct values and

2. Approximating the Median in a Data Stream (8 points)
Given a set S [n] of m distinct values and a value x, we define
rankS(x) :=|{y in S : y = x}|
i.e., the number of values in S that are less or equal to x. We say x is an -approximate median if
(1/2)m = rankS(x)=(1/2+)m .
1.(2 points) Consider the following algorithm for sampling an element from a stream x1, x2,..., xm
where you may assume throughout this question that all values in the stream are distinct:
(a) Initialize s x1
(b) For i =1,2,...,m: with probability 1/i update s xi.
(c) Return s
Prove that at the end of the stream, s is equally likely to be any of the elements in the
stream, i.e., s is chosen uniformly from the set of elements in the stream. Note that this
method doesnt need to know the value of m in advance.
2.(2 points) Consider sampling r elements uniformly and independently at random (with replacement)
from the stream and let Zt be the random variable corresponding to the number
of samples that are less or equal to zt where zt is the t-th smallest element in the stream.
Compute the expectation and variance of Zt.
3.(2 points) Consider an algorithm that samples r elements uniformly and independently at
random (with replacement) from the data stream and returns the median of the sampled
elements. How large must r be such that the output of this algorithm is an -approximate
median with probability at least 99/100? You may assume that 1/4 and give your answer
in big-O notation. Hint: Consider the random variables Z(1/2)m and Z(1/2+)m.
4.(2 points) Another way to achieve uniform sampling is, for each i in [m], to randomly pick a
value yi is uniformly from [0,1]. Then the stream element xi where i = arg minj yj is uniformly
from the set {x1, x2,..., xm}. However, suppose at the end of the stream we are given a value
s in [m] and now need to return a random value in the set {xs, xs+1,..., xm}. It suffices
to return xi where i = arg mins=j=m yj . Describe an algorithm that uses O(logm) space in
expectation to output arg mins=j=m yj . The algorithm does not know s while processing the
stream. Approximating the Median in a Data Stream (8 points)
Given a set Ssub[n] of m distinct values and a value x, we define
rankS(x):=|{yinS:yx}|
i.e., the number of values in S that are less or equal to x. We say x is an lon-approximate median if
(12-lon)mrankS(x)(12+lon)m
(2 points) Consider the following algorithm for sampling an element from a stream x1,x2,dots,xm
where you may assume throughout this question that all values in the stream are distinct:
(a) Initialize slarrx1
(b) For i=1,2,dots,m : with probability 1i update slarrxi.
(c) Return s
Prove that at the end of the stream, s is equally likely to be any of the elements in the
stream, i.e.,s is chosen uniformly from the set of elements in the stream. Note that this
method doesn't need to know the value of m in advance.
(2 points) Consider sampling r elements uniformly and independently at random (with re-
placement) from the stream and let Zt be the random variable corresponding to the number
of samples that are less or equal to zt where zt is the t-th smallest eleme.... CHECK THE PICTURE ATTACHED. THank you
2 . Approximating the Median in a Data Stream ( 8

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!