Ex 5 4 Activation and weight scaling Consider the two hidden unit network shown in Figure 5 6 2 , which uses ReLU activation functions and has no additive bias parameters Your task is to find a set of weights that will fit the function y x 1 1 1 x 2 Can you guess a set of weights that will fit this function Starting with the weights shown in column b , compute the activations for the hid den and final units as well as the regression loss for the nine input values ( x 1 , x 2 ) i n 1 , 0 , 1 1 , 0 , 1 Now compute the gradients of the squared loss with respect to all six weights using the backpropagation chain rule equations ( 5 6 5 5 6 8 ) and sum them up across the training samples to get a final gradient What step size should you take in the gradient direction, and what would your update squared loss become Repeat this exercise for the initial weights in column ( c ) of Figure 5 6 2 Given this new set of weights, how much worse is your error decrease, and how many iterations would you expect it to take to achieve a reasonable solution Figure 5 6 3 Function optimization ( a ) the contour plot of f ( x , y ) x 2 2 0 y 2 with the function being minimized at ( 0 , 0 ) ( b ) ideal gradient descent optimization that quicklyFigure 5 6 2 Simple two hidden unit network with a ReLU activation function and no bias parameters for regressing the function y x 1 1 1 x 2 ( a ) can you guess a set of weights that would fit this function ( b ) a reasonable set of starting weights ( c ) a poorly scaled set of weights Lipton et al ( 2 0 2 1 ) contain myriad graded exercises with code samples to develop your understanding of deep neural networks If you have the time, try to work through most of these Ex 5 4 Activation and weight scaling Consider the two hidden unit network shown in Figure 5 6 2 , which uses ReLU activation functions and has no additive bias parameters Your task is to find a set of weights that will fit the function y x 1 1 1 x 2 Can you guess a set of weights that will fit this function Starting with the weights shown in column b , compute the activations for the hid den and final units as well as the regression loss for the nine input values ( x 1 , x 2 ) i n 1 , 0 , 1 1 , 0 , 1 Now compute the gradients of the squared loss with respect to all six weights using the backpropagation chain rule equations ( 5 6 5 5 6 8 ) and sum them up across the training samples to get a final gradient What step size should you take in the gradient direction, and what would your update squared loss become Repeat this exercise for the initial weights in column ( c ) of Figure 5 6 2 Given this new set of weights, how much worse is your error decrease, and how many iterations would you expect it to take to achieve a reasonable solution converges towards the minimum at x 0 , y 0 Would batch normalization help in this case

The Answer is in the image, click to view ...

Question: Ex 5 . 4 : Activation and weight scaling. Consider the two hidden unit network shown in Figure 5 . 6 2 , which uses

5.4

: Activation and weight scaling. Consider the two hidden unit network shown in

Figure

5.62,

which uses ReLU activation functions and has no additive bias parameters. Your

task is to find a set of weights that will fit the function

y = | x_{1} + 1.1 x_{2} | .

Can you guess a set of weights that will fit this function?

Starting with the weights shown in column

b,

compute the activations for the hid

-

den and final units as well as the regression loss for the nine input values

(x_{1}, x_{2}) i n

{- 1, 0, 1} {- 1, 0, 1} .

Now compute the gradients of the squared loss with respect to all six weights using the

backpropagation chain rule equations

(5.65 - 5.68)

and sum them up across the training

samples to get a final gradient.

What step size should you take in the gradient direction, and what would your update

squared loss become?

Repeat this exercise for the initial weights in column

(

)

of Figure

5.62 .

Given this new set of weights, how much worse is your error decrease, and how many

iterations would you expect it to take to achieve a reasonable solution?

Figure

5.63

Function optimization:

(

)

the contour plot of

f (x, y) = x^{2} + 20 y^{2}

with

the function being minimized at

(0, 0)

;

(

)

ideal gradient descent optimization that quicklyFigure

5.62

Simple two hidden unit network with a ReLU activation function and no bias

parameters for regressing the function

y = | x_{1} + 1.1 x_{2} |

(

)

can you guess a set of weights

that would fit this function?;

(

)

a reasonable set of starting weights;

(

)

a poorly scaled set

of weights.

Lipton et al

. (2021)

contain myriad graded exercises with code samples to develop your

understanding of deep neural networks. If you have the time, try to work through most of

these.

5.4

: Activation and weight scaling. Consider the two hidden unit network shown in

Figure

5.62,

which uses ReLU activation functions and has no additive bias parameters. Your

task is to find a set of weights that will fit the function

y = | x_{1} + 1.1 x_{2} | .

Can you guess a set of weights that will fit this function?

Starting with the weights shown in column

b,

compute the activations for the hid

-

den and final units as well as the regression loss for the nine input values

(x_{1}, x_{2}) i n

{- 1, 0, 1} {- 1, 0, 1} .

Now compute the gradients of the squared loss with respect to all six weights using the

backpropagation chain rule equations

(5.65 - 5.68)

and sum them up across the training

samples to get a final gradient.

What step size should you take in the gradient direction, and what would your update

squared loss become?

Repeat this exercise for the initial weights in column

(

)

of Figure

5.62 .

Given this new set of weights, how much worse is your error decrease, and how many

iterations would you expect it to take to achieve a reasonable solution?

converges towards the minimum at

x = 0, y = 0 .

Would batch normalization help in this case?

Ex 5.4: Activation and weight scaling. Consider the two hidden unit

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Ex 5 . 4 : Activation and weight scaling. Consider the two hidden unit network shown in Figure 5 . 6 2 , which uses ReLU activation functions and has no additive bias parameters. Your task is to find...

Jupyter Notebook Now that we have tried our hand at some single-layer nets, let's see how they stack up compared to multi-layer nets. :) We will be exploring the basic concepts of learning non-linear...

Machine Learning - doing neural networks This is all to be written in Python Introduction In Part 1 of this assignment you will implement a basic neural net in numpy. You are not to use any libraries...

these are the algorithms. The algorithms are for this question. Algorithm 6.3 Forward propagation through a typical deep neural network and the computation of the cost function. The loss L,y) depends...

( 4 points ) Derive d e l J d e l W i j . ( 2 points ) Write d e l J d e l W as an outer product of two vectors. d e l J d e l W is a matrix with the same dimen - sions as W ; it is just like a...

Please help me make an Executive Summary. Explain what you will examine in the case study. Write an overview of the field you are researching. Make a thesis statement and sum up the results of your...

Developments in Technology Light is incident from air on the end face of a multimode optical fibre at angle of incidence as shown below. n n 1 2 The refractive indices of the core and cladding are...

You are supposed to train a neural network that will guide auto - driving car. The training data consists of grayscale 6 4 6 4 pixel images. The training labels include the human driver's steering...

A creative engineer suggests structuring the TLB so that not all the bits of the presented address need match to result in a hit. Suggest how this might be achieved, and what might be the costs and...

Consider the neural network with two hidden layers shown in the following Figure to answer the next 8 questions. The input layer consists of 3 features x = [ \ times 1 , X 2 , X 3 ] , each hidden...

Find the values, if any, of the Boolean variable x thatsatisfy these equations a) x = 1 There are no solutions. x = 0 and x = 1 x = 0 b) There are no solutions. c) There are no solutions. d) There...

In any of the multiple comparison techniques (Tukey-Kramer, LSD ), the estimate of the within-sample variance uses data from the entire experiment. However, if one were to do a two-sample t -test to...

Current Attempt in Progress What term is used by IFRS that is equivalent to "salvage value" for GAAP? Waste value Component value Residual value Depreciable cost

1 . What strategies are used by the company to achieve competitive advantage? Give an explanation 2 . A successful product strategy requires determining the best strategy for each product based on...

Explain the difference between Job Analysis, Job Classification, and Job Evaluation.

What does Processing of an OLAP Cube accomplish?

After designing a Multidimensional Database in Visual Studio, what are the next steps that build the Database in the Analysis Services Instance? How is the build out of the Analytical Services...