Question: Problem 6 : Grouped Means A diamond s cut is a categorical feature describing how well - proportioned the dimensions of the diamond are. This
Problem : Grouped Means
A diamonds cut is a categorical feature describing how wellproportioned the dimensions of the diamond are.
This feature has five possible levels. These levels are, in increasing order of quality, Fair, Good, Very Good,
Premium, and Ideal.
We will now use pair RDD tools to calculate the count, average price, and average carat size for diamonds with
each of the five levels of cut. Note that for any tuple within the diamonds RDD:
The carat size for the associated diamond is stored at index of the tuple.
The cut level for the associated diamond is stored at index of the tuple.
The price for the associated diamond is stored at index of the tuple.
Complete the following steps in a single code cell:
Create a list named cutsummary by performing the transformations and action described below. Try
to perform all of the steps with a single multiline statement by chaining together the methods.
Transform each observation into a tuple of the form cutcarat price, Note
that the first element of this tuple indicates the cut level which we will be grouping by while
the second element of the tuple is another tuple containing other information in which we
are interested.
Use reduceByKey to perform an elementwise sum of the tuples carat price,
for each separate value of the key, which is represented by the cut value. This will produce an
RDD with elements of the form cutsumofcarat, sumofprice, count
Use map to transform the tuples in the previous RDD into ones with the following form:
cut count, meancaratsize, meanprice Round the two means to decimal
places.
Call the collect method to create the desired element list.
To better display the results, use cutsummary to create a Pandas DataFrame named cutdf Set
the following names for the columns of the DataFrame: Cut, Count, MeanCarat, MeanPrice.
Display cutdf without using the print function
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
