Question: Problem 4. Rob designs two algorithms for solving the Word Counting problem. The two algorithms are shown in the following table. Algorithm A Algorithm B

Problem 4. Rob designs two algorithms for solving the Word Counting problem. The two algorithms are shown in the following table.

Algorithm A

Algorithm B

book = sc.textFile(/home/rob/data/peterpan.txt)

book.count()

book.first()

wordCount = book.flatMap(lamba line : line.split( )) \

.map(lambda word : (word, 1)) \

.reduceByKey(lambda x, y : x + y )

wordcount.collect()

book = sc.textFile(/home/rob/data/peterpan.txt).persist()

book.count()

book.first()

wordCount = book.flatMap(lamba line : line.split( )) \

.map(lambda word : (word, 1)) \

.reduceByKey(lambda x, y : x + y )

wordcount.collect()

The only difference between Algorithm A and B is that we add .persist() at the end of the first line in Algorithm B. Which one (Algorithm A or B) runs faster and why?

Answer:

Instead of persist(), we can also use cache(). What is the difference between persist() and cache()?

Answer:

In the Algorithm A, how many RDDs are there? Please tell the type of the RDD for each. Standard string RDD or key-value pair RDD? Please also explain the meaning of the elements in each RDD.

Answer:

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!