Question: Q 6 . When y = 1 , what is the gradient of the loss function w . r . t . W 1 1

Q6. When y=1, what is the gradient of the loss function w.r.t.W11? Write your answer to three decimal places.
Note: Please use the computation graph method. One can calculate the gradient directly using chain rules, but if the computation graph is not used at all, it will not score properly. Try to fill the red boxes above. This question does not need coding and the answer can be easily obtained analytically.
Hint: You may use the property of del(z)delz=(1-)
0.122
Correct
Your answer is correct.
q,
delL
delW11
18 Consider a neural network shown below.
06 points
Consider we have a cross-entropy loss function for binary classification:
L=-[yln(a)+(1-y)ln(1-a)], where a is the probability out from the output layer activation function. We've built a computation graph of the network as shown below. The blue letters below are intermediate variable labels to help you understand the connection between the network architecture graph above and the computation graph.With the same condition (y=1) and the learning rate =12, what is the updated weight W21(new)? Write
your answer to three decimal places.
Note: Please use the computation graph method. One can calculate the gradients directly using chain rules, but if
the computation graph is not used at all, it will not score properly. Try to fill the red boxes in the computation
graph. This question does not need coding and the answer can be easily obtained analytically.
Hint: You may use the property of del(z)delz=(1-)
Calculate new weight using the old weight and learning learning as follows:
W21larrW21-delLdelW21
1.062
Incorrect
Your answer is incorrect.
What isW21larrW21-delLdelW21? Consider a neural network shown below.
66 points
Consider we have a cross-entropy loss function for binary classification:
L=-[yln(a)+(1-y)ln(1-a)], where a is the probability out from the output layer activation function. We've built a computation graph of the network as shown below. The blue letters below are intermediate variable labels to help you understand the connection between the network architecture graph above and the computation graph.Consider a neural network shown below. 6 points Consider we have a cross-entropy loss function for binary classification: L=-[yln(a)+(1-y)ln(1-a)], where a is the probability out from the output layer activation function. We've built a computation graph of the network as shown below. The blue letters below are intermediate variable labels to help you understand the connection between the network architecture graph above and the computation graph. With the same condition ( y=1) and the learning rate \eta =(1)/(2), what is the updated weight W21(new)? Write your answer to three decimal places. Note: Please use the computation graph method. One can calculate the gradients directly using chain rules, but if the computation graph is not used at all, it will not score properly. Try to fill the red boxes in the computation graph. This question does not need coding and the answer can be easily obtained analytically. Hint: You may use the property of (del\sigma (z))/(delz)=\sigma (1-\sigma ) I need answer to be rounded to 3 decimal points
Q6. Answer is 0.125
i need answer of Q7
 Q6. When y=1, what is the gradient of the loss function

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!