Question: Question 6: Spark (10%) Write a function proc headers (1st, N) that applies further transformations to the dataset returned by the proc_ headers (lst) function

Question 6: Spark (10%) Write a function proc headers (1st, N)

Question 6: Spark (10%) Write a function proc headers (1st, N) that applies further transformations to the dataset returned by the proc_ headers (lst) function from Question 5 to identify N Email addresses, which are most popular in terms of the number of Emails that where sent to them. The output must be a list of N tuples ( n, E) where n is the number of Email transmissions having E as their recipient address. The list must be sorted in the descending lexicographical order, that is, (nl, E1) > (n2, E2) if and only if either nl > n2 or nl n2 and El > E2 Hint: Use a map/reduceByKey pattern as in the word count example to pair Email addresses with their popularity counts, the sortBy transformation to sort them in the descending lexicographical order, and the take( action to extract the top-N records Note that calling proc_headers) first is only needed to prevent any errors in the implementation of Question 5 from propagating to the solution of this question as this way, we will be able to use the model implementation of proc_headers() for testing. A more efficient solution would avoid materializing the results of proc_headers() in the driver, and instead directly extend the processing steps of Question 5 with further operations. Make sure you understand why it is important! (6): def get-top-emails (1st, N): lst: a list of tuples (FROM, To, CC, BCC) representing EMail headers N: a positive integer Returns a list of N tuples (n, E) sorted in the descending lexicographical order representing the top N most popular EMail destinations as described in the question rddsc.parallelize(proc_headers (lst)) # Insert your code after this line You can use the following code to test your impementation of get_top_emails): print( '.join (str (t) for t in get top_emails ([headerl, header2, header3], 3))) The output produced by the 1line above when executed with the model implementation of get top emails() was as follows: (3, tom.kearney@enron.com) (2, 'mike.mcconnel1@enron.com) (2, george.mcclellaneenron.com) Question 6: Spark (10%) Write a function proc headers (1st, N) that applies further transformations to the dataset returned by the proc_ headers (lst) function from Question 5 to identify N Email addresses, which are most popular in terms of the number of Emails that where sent to them. The output must be a list of N tuples ( n, E) where n is the number of Email transmissions having E as their recipient address. The list must be sorted in the descending lexicographical order, that is, (nl, E1) > (n2, E2) if and only if either nl > n2 or nl n2 and El > E2 Hint: Use a map/reduceByKey pattern as in the word count example to pair Email addresses with their popularity counts, the sortBy transformation to sort them in the descending lexicographical order, and the take( action to extract the top-N records Note that calling proc_headers) first is only needed to prevent any errors in the implementation of Question 5 from propagating to the solution of this question as this way, we will be able to use the model implementation of proc_headers() for testing. A more efficient solution would avoid materializing the results of proc_headers() in the driver, and instead directly extend the processing steps of Question 5 with further operations. Make sure you understand why it is important! (6): def get-top-emails (1st, N): lst: a list of tuples (FROM, To, CC, BCC) representing EMail headers N: a positive integer Returns a list of N tuples (n, E) sorted in the descending lexicographical order representing the top N most popular EMail destinations as described in the question rddsc.parallelize(proc_headers (lst)) # Insert your code after this line You can use the following code to test your impementation of get_top_emails): print( '.join (str (t) for t in get top_emails ([headerl, header2, header3], 3))) The output produced by the 1line above when executed with the model implementation of get top emails() was as follows: (3, tom.kearney@enron.com) (2, 'mike.mcconnel1@enron.com) (2, george.mcclellaneenron.com)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Question 5: Spark (40%) Write a function proc_headers (lst) that takes a list lst of Email headers, and returns a list of tuples (El, E2) for every Email transmission from E1 to E2 . Each header in...

Law and Regulation in Human Resources HRMT 5301 Written Assignment The written assignment is worth 100 points and is due by October 13th at 11:59 PM. To complete the assignment you will need to read...

A creative engineer suggests structuring the TLB so that not all the bits of the presented address need match to result in a hit. Suggest how this might be achieved, and what might be the costs and...

******** PLEASE HELP ME ANSWER QUESTION 6-10********* For Question 6 thru 10, write two sentences and you are to research any one operating system of your choice desktop, server, or smart device and...

******** PLEASE HELP ME ANSWER QUESTION 6-10, ONLY TWO SENTENCES NEEDED AND PLEASE READ INSTRUCTIONS AND DO NOT GIVE ME DEFINITIONS OF THE WORDS ********* For Question 6 thru 10, write two sentences...

Question 1: Operator Trace A What is the value of num after each statement in the code below? int num = 4; num += 3; num /= 3; num = num * (3 + num); What is the value of d after each statement in...

C# question #region Question 1 - 13 marks /* * Write a method that does not take any argument nor does it * return a value. The method only displays the following text. * * YOU MUST INSERT YOUR NAME...

MyLab Math | Pc X Bb My Blackboard ( X Bb Courses - Blackt X P Exam #1 (Chap x My Home x My Home x CengageNOWv x o Mail - Ezekiel Be X @ Payment metho X bes Bet with bet365 X INTERAC e-Tran X X C...

show the steps and see if my answer is correct Question 1 E 10 pts '0 1 G) Details In a normal distribution, a data value located 1.3 standard deviations below the mean has Standard Score: 2 = In a...

First three pages are the same question, following 9 pages are equations can use only. No acceleration Question 6 of8 10/ 10 E Atennis ball has a mass of 0.057 kg. A professional tennis player hits...

1. A three-page summary of Findings to include: 1. A one-page Executive Summary with one paragraph summarizing the initiation of the investigation, one paragraph summarizing what your investigation...

Nevada Company experienced the following events during its first year of operations: 1. Acquired additional $1,000 cash from the issue of common stock. 2. Paid $2,400 cash for utilities expense. 3....

Item 1 2 points Item 1 Leasing has become the number one method of external financing by U . S . companies. Reasons include each of the following except: Multiple Choice Extended use of the asset....

Scenario: A Multi-national Corporation called The Globe has created a subsidiary called Thirst in an under-developed country called non-potable waters. This hosting country suffers from a severe...

Prepare a strengths, weaknesses, opportunities, and threats analysis (SWOT).

2. What is the impact of information systems on organizations?

Evaluate the impact of technology on HR employee services.