Question: Problem 2 : Working with a Numerical RDD In the first code cell for this problem, we will create a randomly generated RDD and will

Problem 2: Working with a Numerical RDD
In the first code cell for this problem, we will create a randomly generated RDD and will then calculate some
descriptive statistics for this RDD.
Complete the following steps in a single code cell:
1. Use the following line of code to create an RDD containing 1.2 million elements randomly selected
from the interval [0,1]: random_rdd = RandomRDDs.uniformRDD(sc, size=1200000,
seed=1)
2. Use built-in RDD methods to calculate the sum, mean, standard deviation, minimum, and maximum
of the values stored in random_rdd. Display the results in the format shown below, replacing the
xxxx strings with the appropriate values. Add spacing to ensure that the numerical values are leftaligned.
Sum: xxxx
Mean: xxxx
Std Dev: xxxx
Minimum: xxxx
Maximum: xxxx
In the second cell in Problem 2, we will explore how the RDD we created has been partitioned.
Complete the following steps in a single code cell:
1. Use the getNumPartitions() RDD method to determine the number of partitions used for
random_rdd.
2. Use glom() and map() along with the built-in Python function len() to create a list that contains
the number of elements contained within each of the partitions of random_rdd. Try to complete this
task with a single line of code by chaining together RDD transformations, followed by collect().
3. Print the results in the format shown below, replace the xxxx strings with the appropriate values.
Number of Partitions: xxxx
Size of Partitions:
[xxxx, xxxx, xxxx, xxxx]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!