Question: Ex 5 . 4 : Activation and weight scaling. Consider the two hidden unit network shown in Figure 5 . 6 2 , which uses

Ex 5.4: Activation and weight scaling. Consider the two hidden unit network shown in
Figure 5.62, which uses ReLU activation functions and has no additive bias parameters. Your
task is to find a set of weights that will fit the function
y=|x1+1.1x2|.
Can you guess a set of weights that will fit this function?
Starting with the weights shown in column b, compute the activations for the hid-
den and final units as well as the regression loss for the nine input values (x1,x2)in
{-1,0,1}{-1,0,1}.
Now compute the gradients of the squared loss with respect to all six weights using the
backpropagation chain rule equations (5.65-5.68) and sum them up across the training
samples to get a final gradient.
What step size should you take in the gradient direction, and what would your update
squared loss become?
Repeat this exercise for the initial weights in column (c) of Figure 5.62.
Given this new set of weights, how much worse is your error decrease, and how many
iterations would you expect it to take to achieve a reasonable solution?
Figure 5.63 Function optimization: (a) the contour plot of f(x,y)=x2+20y2 with
the function being minimized at (0,0); (b) ideal gradient descent optimization that quicklyFigure 5.62 Simple two hidden unit network with a ReLU activation function and no bias
parameters for regressing the function y=|x1+1.1x2| : (a) can you guess a set of weights
that would fit this function?; (b) a reasonable set of starting weights; (c) a poorly scaled set
of weights.
Lipton et al.(2021) contain myriad graded exercises with code samples to develop your
understanding of deep neural networks. If you have the time, try to work through most of
these.
Ex 5.4: Activation and weight scaling. Consider the two hidden unit network shown in
Figure 5.62, which uses ReLU activation functions and has no additive bias parameters. Your
task is to find a set of weights that will fit the function
y=|x1+1.1x2|.
Can you guess a set of weights that will fit this function?
Starting with the weights shown in column b, compute the activations for the hid-
den and final units as well as the regression loss for the nine input values (x1,x2)in
{-1,0,1}{-1,0,1}.
Now compute the gradients of the squared loss with respect to all six weights using the
backpropagation chain rule equations (5.65-5.68) and sum them up across the training
samples to get a final gradient.
What step size should you take in the gradient direction, and what would your update
squared loss become?
Repeat this exercise for the initial weights in column (c) of Figure 5.62.
Given this new set of weights, how much worse is your error decrease, and how many
iterations would you expect it to take to achieve a reasonable solution?
converges towards the minimum at x=0,y=0.
Would batch normalization help in this case?
 Ex 5.4: Activation and weight scaling. Consider the two hidden unit

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!