Question: 1. What would be the output of this code 2. Detail what changes would you need to make to the above code for it to

1. What would be the output of this code
2. Detail what changes would you need to make to the above code for it to instead save to HDFS the number of albums released in each decade (in words, and code)
You are given a dataset on the Greatest Albums of All Time. It is in a CSV (comma separated value) format and consists of the following fields: AlbumRanking , ReleaseYear AlbumName ArtistName, Genre > An example of this data is as follows: Rock 9 1 , 1967 2 , 1966 3 , 1966 4 ,1965 5 ,1 1971 Sgt. Pepper 's Lonely Hearts CB , The Beatles, Pet Sounds The Beach Boys , Rock Revolver , The Beatles , Rock Highway 61 Revisited Bob Dylan , Folk What 's Going On , Marvin Gaye , Funk 9 9 Suppose you execute the following Python Spark job on the dataset: names = lines = sc.textFile("hdfs://inputPath") glines lines.filter (lambda 1 : len ( 1. split ("," ))== 5) glines.map(lambda 1 : 1.split("," ) [3]) occurrences = names.map (lambda x : (x ,1)).reduceByKey(lambda a,b : a+b) results = occurrences.takeOrdered(10, key lambda x: -x [1])
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
