Question: We should avoid using groupByKey because of it . . . A . Always reads the data from HDFS and causes large data transport. B

We should avoid using groupByKey because of it ...
A.
Always reads the data from HDFS and causes large data transport.
B.
Shuffles all the key-value pairs data around and generates lots of unnecessary data transport. Also, it may cause memory problems because when grouping the values by key, all the data associated with a single key has to be collected on one worker node.
C.
causes lots of communication with master node and it has lots of costs.
D.
generates lots tiny small jobs compared to other transformation operations.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!