Question: Ex 5 . 4 : Activation and weight scaling. Consider the two hidden unit network shown in Figure 5 . 6 2 , which uses

5.4

: Activation and weight scaling. Consider the two hidden unit network shown in

Figure

5.62,

which uses ReLU activation functions and has no additive bias parameters. Your

task is to find a set of weights that will fit the function

y = | x_{1} + 1.1 x_{2} | .

Can you guess a set of weights that will fit this function?

Starting with the weights shown in column

b,

compute the activations for the hid

-

den and final units as well as the regression loss for the nine input values

(x_{1}, x_{2}) i n

{- 1, 0, 1} {- 1, 0, 1} .

Now compute the gradients of the squared loss with respect to all six weights using the

backpropagation chain rule equations

(5.65 - 5.68)

and sum them up across the training

samples to get a final gradient.

What step size should you take in the gradient direction, and what would your update

squared loss become?

Repeat this exercise for the initial weights in column

(

)

of Figure

5.62 .

Given this new set of weights, how much worse is your error decrease, and how many

iterations would you expect it to take to achieve a reasonable solution?

Figure

5.63

Function optimization:

(

)

the contour plot of

f (x, y) = x^{2} + 20 y^{2}

with

the function being minimized at

(0, 0)

;

(

)

ideal gradient descent optimization that quickly

converges towards the minimum at

x = 0, y = 0 .

Would batch normalization help in this case?

Ex 5.4: Activation and weight scaling. Consider the two hidden unit

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Ex 5 . 4 : Activation and weight scaling. Consider the two hidden unit network shown in Figure 5 . 6 2 , which uses ReLU activation functions and has no additive bias parameters. Your task is to find...

Jupyter Notebook Now that we have tried our hand at some single-layer nets, let's see how they stack up compared to multi-layer nets. :) We will be exploring the basic concepts of learning non-linear...

Machine Learning - doing neural networks This is all to be written in Python Introduction In Part 1 of this assignment you will implement a basic neural net in numpy. You are not to use any libraries...

these are the algorithms. The algorithms are for this question. Algorithm 6.3 Forward propagation through a typical deep neural network and the computation of the cost function. The loss L,y) depends...

( 4 points ) Derive d e l J d e l W i j . ( 2 points ) Write d e l J d e l W as an outer product of two vectors. d e l J d e l W is a matrix with the same dimen - sions as W ; it is just like a...

Please help me make an Executive Summary. Explain what you will examine in the case study. Write an overview of the field you are researching. Make a thesis statement and sum up the results of your...

Developments in Technology Light is incident from air on the end face of a multimode optical fibre at angle of incidence as shown below. n n 1 2 The refractive indices of the core and cladding are...

You are supposed to train a neural network that will guide auto - driving car. The training data consists of grayscale 6 4 6 4 pixel images. The training labels include the human driver's steering...

A creative engineer suggests structuring the TLB so that not all the bits of the presented address need match to result in a hit. Suggest how this might be achieved, and what might be the costs and...

Consider the neural network with two hidden layers shown in the following Figure to answer the next 8 questions. The input layer consists of 3 features x = [ \ times 1 , X 2 , X 3 ] , each hidden...

Figure 12-1 shows that the share of labor in national income changed little from 1970 to 2007 even though total incomes (GDP) rose by a factor of three. Draw a set of economywide curves like those in...

Jen Miller made $1,090 this week. Only social security (fully taxable) and federal income taxes attach to her pay. Miller contributes $125 each week to her companys 401(k) plan and has $40 put into...

According to Graham, in estimating the P / E ratio of a company, a multiyear average of past earnings should be used. True False

One year ago, you purchased 170 shares of Best Wings stock at a price of $39.58 per share. The company pays an annual dividend of $0.34 per share. Today, you sold for the shares for $39.01 a share....

6. Are my sources reliable?

How do you feel about the fact that even unintentionally using someone elses words, ideas, or intellectual property is still plagiarism? Does it seem unfair that you might suffer severe consequences...

5. Are my sources compelling?