Question: Problem 4. Rob designs two algorithms for solving the Word Counting problem. The two algorithms are shown in the following table. Algorithm A Algorithm B
Problem 4. Rob designs two algorithms for solving the Word Counting problem. The two algorithms are shown in the following table.
| Algorithm A | Algorithm B |
| book = sc.textFile(/home/rob/data/peterpan.txt) book.count() book.first() wordCount = book.flatMap(lamba line : line.split( )) \ .map(lambda word : (word, 1)) \ .reduceByKey(lambda x, y : x + y ) wordcount.collect() | book = sc.textFile(/home/rob/data/peterpan.txt).persist() book.count() book.first() wordCount = book.flatMap(lamba line : line.split( )) \ .map(lambda word : (word, 1)) \ .reduceByKey(lambda x, y : x + y ) wordcount.collect() |
The only difference between Algorithm A and B is that we add .persist() at the end of the first line in Algorithm B. Which one (Algorithm A or B) runs faster and why?
Answer:
Instead of persist(), we can also use cache(). What is the difference between persist() and cache()?
Answer:
In the Algorithm A, how many RDDs are there? Please tell the type of the RDD for each. Standard string RDD or key-value pair RDD? Please also explain the meaning of the elements in each RDD.
Answer:
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
