Question: Using RDDs write a code to answer the following questions ( Q 1 - Q 5 ) using given . csv files. Q 1 :
Using RDDs write a code to answer the following questions QQ using given csv files.
Q: For the time range between : and : find the most
used servers. Results to be given in descending order of servers.
Tips: For this query you will need to filter out the records that have null values so that they
are not taken into account in the calculation. Also, you will need to process the date with an
appropriate Python library.
Q: For the target URL
xxx in warc.csv file, find the
content length of the metadata as well as the size of HTML DOM number of characters
Tips: For this query you should filter by url. Remember to restart
the Spark cluster before each measurement, to avoid hot caches, or you can clear the cache
with the command spark.catalog.clearCache
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
