Question: Question 1.3 [4 Points] - Computing Counts of Pairs As seen above, hash functions are useful whenever we need to (relatively) evenly distribute data. Another
![Question 1.3 [4 Points] - Computing Counts of Pairs As seen](https://dsd5zvtm8ll6.cloudfront.net/si.experts.images/questions/2024/09/66f4611116cb6_66466f4611088f0e.jpg)

Question 1.3 [4 Points] - Computing Counts of Pairs As seen above, hash functions are useful whenever we need to (relatively) evenly distribute data. Another such application is dividing across multiple machines to process in parallel. Imagine you have created a music app. Users of your app can login, start a session, and then play songs. The database which forms the backend to your app contains a table with tuples (user_id, session_id, song_id) which represent every time a user has played a song. You wish to see which pairs of songs are most frequently listened to together in the same session. You are given the following information: - There are 10 million users - There are 1 thousand songs - There are 4 billion sessions - The user IDs are 3 bytes - The song IDs are 2 bytes - The session IDs are 4 bytes - The avg. number of unique songs played per session is 5 Assume all your data is stored on hard disks and use the values for disk seek/scan times from the top of the assignment. Assume that data is stored sequentially on disk. Calculate the total time required to compute the counts for each distinct pair. Do not count pairs with the same song (i.e. Song A - Song A) and assume that when counting we do not count different orderings of the same songs as distinct pairs (i.e. if Song A and Song B are played in the same session, we only count the pair Song A - Song B and do not count the pair Song B - Song A). Assume that it takes 1hr. to sort all pairs using the external merge sort algorithm. Please show some work/reasoning
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
