Question: Task 2: Heavy Hitter Discovery (15 Points): Given a set of n values V={v1,v2,,vn} from n users, where the values are from a bounded domain

Task 2: Heavy Hitter Discovery (15 Points): Given a set of n values V={v1,v2,,vn} from n users, where the values are from a bounded domain D. Suppose each value vi is represented as a binary string with length m (e.g., when m=4,vi 's value is 7 , then vi=0111;vi 's value is 8 , then vi=1000 ). The naive approach of querying the frequency of each string requires 2m oracle queries and is infeasible when m is large. Now your goal is to design a LDP protocol to identify the top- k heavy hitter, i.e., the k most frequent values in V, such that it is computationally feasible to query the frequency oracle. - Straw man protocol (4 Points): A length- m value v is divided into g equal-size segments, each of length s=m/g. In this protocol, each user randomly chooses a segment to report, and the aggregator first queries the frequency of each length- s binary string in each of the g segments, and then identify the frequent patterns in each segment, where are denoted as C1,C2,,Cg. The candidate set C is the Cartesian product of {Ci} 's, i.e., C=C1C2Cg, where Cartesian product operation is defined as C1C2={cicj:ciC1 and cjC2}, and is the string concatenation operation. Finally, the aggregator queries frequencies of the strings in candidates C. Answer the following questions: Page 1 CS528 Data Security and Privacy Instructor: Binghui Wang Assignment 2 (Due: 02/19/2023) 1. 2 Points. What is the number of total frequency oracle queries using this protocol? 2. 2 Points. What is the size of the candidate set C for top- k hevay hitter discovery? - Segment pair protocol (4 Points.) This protocol improves upon the Straw man protocol. The key differences is that, instead of reporting only one segment from g segments, each user reports a pair of two randomly chosen segments. The detailed protocol is as follows: First, the aggregator identifies the frequent patterns in each of the g segments. Then, it queries, for each pair i,j of segments, the frequency for the values in CiCj and identifies the value pairs that are frequent in segments i,j. From the frequent value pairs for each pair of segments, the aggregator recovers candidates for frequent values for the whole domain, using the a priori principle that if a value vD is frequent, every pair of its segments must also be frequent. Answer the following questions: 1. 2 Points. What is the number of total frequency oracle queries using this protocol? 2. 2 Points. What is the expected number of user reports on each pair of segments? - Prefix Extending protocol (7 Points.) Assume you want to identify k=150 most frequent values using the Prefix Extending protocol. The input domain D is 15 bytes (i.e., 120 bits), and you want to limit the total number of frequency oracle queries to no more than 228. 1. 3 Points. How to design the Prefix Extending protocol? 2. 4 Points. Which frequency oracles can be used to achieve high accuracy
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
