Question: Problem 2 : Working with a Numerical RDD In the first code cell for this problem, we will create a randomly generated RDD and will
Problem : Working with a Numerical RDD
In the first code cell for this problem, we will create a randomly generated RDD and will then calculate some
descriptive statistics for this RDD
Complete the following steps in a single code cell:
Use the following line of code to create an RDD containing million elements randomly selected
from the interval : randomrdd RandomRDDs.uniformRDDsc size
seed
Use builtin RDD methods to calculate the sum, mean, standard deviation, minimum, and maximum
of the values stored in randomrdd Display the results in the format shown below, replacing the
xxxx strings with the appropriate values. Add spacing to ensure that the numerical values are leftaligned.
Sum: xxxx
Mean: xxxx
Std Dev: xxxx
Minimum: xxxx
Maximum: xxxx
In the second cell in Problem we will explore how the RDD we created has been partitioned.
Complete the following steps in a single code cell:
Use the getNumPartitions RDD method to determine the number of partitions used for
randomrdd
Use glom and map along with the builtin Python function len to create a list that contains
the number of elements contained within each of the partitions of randomrdd Try to complete this
task with a single line of code by chaining together RDD transformations, followed by collect
Print the results in the format shown below, replace the xxxx strings with the appropriate values.
Number of Partitions: xxxx
Size of Partitions:
xxxx xxxx xxxx xxxx
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
