Question: Problem 5 : Calculating r - Squared We will continue the analysis started in Problem 4 by calculating the r - squared score for our

Problem 5: Calculating r-Squared
We will continue the analysis started in Problem 4 by calculating the r-squared score for our predictions. The
first step in this process in to calculate the mean of the observed values.
Complete the following steps in a single code cell:
Use the map () transformation along with a lambda function to select the first element of each tuple
in the pairs RDD. Call the mean () method of the resulting RDD, storing the result in a variable
named mean.
Print mean.
Note that this calculation might take a couple of minutes to complete.
We will now calculate the sum of the squared deviations between each observed value and their mean. This
quantity is sometimes referred to as SST, or "total sum of squared deviations".
Complete the following steps in a single code cell:
Use the map() transformation along with a lambda function to calculate the square of the difference
between each observed value in pairs and mean. Call the sum() method of the resulting RDD,
storing the result in a variable named SST.
Print SST.
We will now calculate the r-squared score for the predictions. The formula for this value is given as follow: r2=
1-SSESST
Complete the following steps in a single code cell:
Use SSE and SST to calculate r-squared, storing the result in a variable named r2.
Print r2.
prior code prob 4: # Read the data file into an RDD
pairs_raw = spark.sparkContext.textFile("/FileStore/tables/pairs_data.txt")
# Count the number of elements
num_elements = pairs_raw.count()
print(f"Number of elements: {num_elements}")
# Display the first 5 elements (as strings)
print("First 5 elements:")
for element in pairs_raw.take(5):
print(element)
# Function to process each line
def process_line(row):
# Split the line at space and convert tokens to floats
return tuple(map(float, row.split()))
# Apply process_line function and store in pairs RDD
pairs = pairs_raw.map(process_line)
# Display the first 5 elements (as tuples)
print("
First 5 elements after processing:")
for element in pairs.take(5):
print(element)
# Calculate SSE using lambda function and sum
SSE = pairs.map(lambda x: (x[0]- x[1])**2).sum()
print(f"
Sum of Squared Errors (SSE): {SSE}")
 Problem 5: Calculating r-Squared We will continue the analysis started in

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!