Question: Problem 4 : Training Predictions Problem 4 : Training Predictions We will now use the model to generate predictions for the training set. Use the

Problem 4: Training Predictions Problem 4: Training Predictions
We will now use the model to generate predictions for the training set.
Use the model created in Problem 3 to generate predictions for the training set. Store the results in a
DataFrame named train_pred. Display the first 10 rows of the probability, prediction, and stroke
columns. Set the option truncate=False.
We will now view a few rows for which the model generated incorrect predictions.
Using filtering operations to find all rows where the prediction column has a different value from the
stroke column. Display the first 10 rows of the probability, prediction, and stroke columns for the
filtered DataFrame. Set the option truncate=False.
To give you practice interpreting the probability estimates generated by the model, you will now be asked to
identify for which of these 10 records the model is most and least confident in its prediction.
Of the 10 records displayed in the code cell above, identify the following:
The record for which the model has assigned the highest probability to an incorrect answer.
The record for which the model has assigned the lowest probability to an incorrect answer.
Provide your answer in a markdown cell with a bullet-pointed list as shown below. Round the probabilities to
4 decimal places.
The highest probability observed for an incorrect answer is xxxxx.
The lowest probability observed for an incorrect answer is xxxxx.
Problem 5: Classification Metrics
Next, we will use the training predictions to calculate several classification metrics for the model. Note that
these results will be using the training data rather than out-of-sample data and will likely be overly optimistic.
Use the code below to create a pair RDD of prediction/label pairs:
pred_and_labels = train_pred.rdd.map(lambda x:(x['prediction'],float(x['stroke'])))
Then use this pair RDD to create an instance of the MulticlassMetrics class named metrics. Print the
accuracy attribute of this object.
We will now display the confusion matrix for the training data.
Extract the confusion matrix from the metrics object, storing the result in a variable. Then display the
confusion matrix as a Pandas DataFrame with the columns and rows named according to the label values
(which are 0 and 1).
We will now display the precision and recall for both label classes.
Use the metrics object to calculate the precision and recall for both label values (0 and 1). Display the
results in the format shown below. Round the displayed values to 4 decimal places.
We will now use the model to generate predictions for the training set.
Use the model created in Problem 3 to generate predictions for the training set. Store the results in a
DataFrame named train_pred. Display the first 10 rows of the probability, prediction, and stroke
columns. Set the option truncate=False.
We will now view a few rows for which the model generated incorrect predictions.
Using filtering operations to find all rows where the prediction column has a different value from the
stroke column. Display the first 10 rows of the probability, prediction, and stroke columns for the
filtered DataFrame. Set the option truncate=False.
To give you practice interpreting the probability estimates generated by the model, you will now be asked to
identify for which of these 10 records the model is most and least confident in its prediction.
Of the 10 records displayed in the code cell above, identify the following:
The record for which the model has assigned the highest probability to an incorrect answer.
The record for which the model has assigned the lowest probability to an incorrect answer.
Provide your answer in a markdown cell with a bullet-pointed list as shown below. Round the probabilities to
4 decimal places.
The highest probability observed for an incorrect answer is xxxxx.
The lowest probability observed for an incorrect answer is xxxxx.
Problem 4 : Training Predictions Problem 4 :

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!