Question: Problem 4 : Training Predictions Problem 4 : Training Predictions We will now use the model to generate predictions for the training set. Use the

Problem

4

: Training Predictions Problem

4

: Training Predictions

We will now use the model to generate predictions for the training set.

Use the model created in Problem

3

to generate predictions for the training set. Store the results in a

DataFrame named train

_

pred. Display the first

10

rows of the probability, prediction, and stroke

columns. Set the option truncate

=

False.

We will now view a few rows for which the model generated incorrect predictions.

Using filtering operations to find all rows where the prediction column has a different value from the

stroke column. Display the first

10

rows of the probability, prediction, and stroke columns for the

filtered DataFrame. Set the option truncate

=

False.

To give you practice interpreting the probability estimates generated by the model, you will now be asked to

identify for which of these

10

records the model is most and least confident in its prediction.

Of the

10

records displayed in the code cell above, identify the following:

The record for which the model has assigned the highest probability to an incorrect answer.

The record for which the model has assigned the lowest probability to an incorrect answer.

Provide your answer in a markdown cell with a bullet

-

pointed list as shown below. Round the probabilities to

4

decimal places.

The highest probability observed for an incorrect answer is xxxxx

.

The lowest probability observed for an incorrect answer is xxxxx

.

Problem

5

: Classification Metrics

Next, we will use the training predictions to calculate several classification metrics for the model. Note that

these results will be using the training data rather than out

-

-

sample data and will likely be overly optimistic.

Use the code below to create a pair RDD of prediction

/

label pairs:

pred

_

and

_

labels

=

train

_

pred.rdd

.

map

(

lambda x:

(

['

prediction

'],

float

(

['

stroke

'])))

Then use this pair RDD to create an instance of the MulticlassMetrics class named metrics. Print the

accuracy attribute of this object.

We will now display the confusion matrix for the training data.

Extract the confusion matrix from the metrics object, storing the result in a variable. Then display the

confusion matrix as a Pandas DataFrame with the columns and rows named according to the label values

(

which are

0

and

1) .

We will now display the precision and recall for both label classes.

Use the metrics object to calculate the precision and recall for both label values

(0

and

1) .

Display the

results in the format shown below. Round the displayed values to

4

decimal places.

We will now use the model to generate predictions for the training set.

Use the model created in Problem

3

to generate predictions for the training set. Store the results in a

DataFrame named train

_

pred. Display the first

10

rows of the probability, prediction, and stroke

columns. Set the option truncate

=

False.

We will now view a few rows for which the model generated incorrect predictions.

Using filtering operations to find all rows where the prediction column has a different value from the

stroke column. Display the first

10

rows of the probability, prediction, and stroke columns for the

filtered DataFrame. Set the option truncate

=

False.

To give you practice interpreting the probability estimates generated by the model, you will now be asked to

identify for which of these

10

records the model is most and least confident in its prediction.

Of the

10

records displayed in the code cell above, identify the following:

The record for which the model has assigned the highest probability to an incorrect answer.

The record for which the model has assigned the lowest probability to an incorrect answer.

Provide your answer in a markdown cell with a bullet

-

pointed list as shown below. Round the probabilities to

4

decimal places.

The highest probability observed for an incorrect answer is xxxxx

.

The lowest probability observed for an incorrect answer is xxxxx

.

Problem 4 : Training Predictions Problem 4 :

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

PLEASE HELP ME COMPLETE THIS WHOLE PYTHON PROGRAMMING PROJECT Activity 1: Create Dummy Dataset In this activity, you have to create a dummy dataset for multiclass classification. The steps to be...

Attached is Accounting assignment along side recommended readings to answer certain questions. Thank you Assignment 1 Problem 1 15 points Reading - W. L. Ferrara, Cost/Management Accounting: The 21st...

Digit Classification with KNN and Naive Bayes # This tells matplotlib not to try opening a new window for each plot. %matplotlib inline # Import a bunch of libraries. import time import numpy as np...

Southern Africa Trust: embarking on a sustainability journey Keratiloe Mogotsi, Bhekinkosi Moyo and Angie Urban I n May 2020, working from her home office just over one month into a nationwide...

Please help with this culminating. My due date almost near obat Reader DC 2 Culmi... * 1 76 75% Culminating Task: Making the World a Better Place Part of the evaluation of this course will be in the...

We have three papers to read for this chapter. We are not going to discuss the details in this chapter because they are in the Modigliani and Miller 1958 paper, which is included in the reading...

Beginning in the 1990s, the oil company BP promoted its "green" initiatives, which were designed to highlight the company's environmental practices. It launched an alternative energy division and cut...

One important measure of workers' performance is their attendance record. A personnel manager at a large company with a serious absenteeism problem undertook a two-year study. She took a random...

Topic: Conducting personal job interviews using the star model 1-Design a two-hour training work plan for 10 trainees 2-Determine the quality of trainees 3-Use the training design model Formulate one...

I hope you can answer this question and find the reference below the question. Thank you Topic: Conducting personal job interviews using the STAR model 1- Design a two-hour training work plan for 10...

Write a paper on the topic Individual Educational plan And the include the following parts IEP Contents and Student Needs If The School District Has Limited Funding And Opposes the IEP Advocating for...

It is commonly taught in introductory microeconomics courses that minimum wages cause unemployment. The federally mandated minimum wage is $7.25, but approximately12 states have higher state-mandated...

A tendency for fraud may exist when the granting of stock options is dependent on reaching an earnings goal. a . True b . False

Consider the following information: Demand rate (D) = 1,500 units per hour Lead time (T) = 8 hours Container capacity (C) = 230 units Safety factor (x) = 15% a. The number of kanban production cards