We are using gradient descent to learn the parameters of a simple neural network for binary classification f ( x ) ( w 1 x w 0 ) f(x) (w 1 x w 0 ), where x , w 0 , w 1 R x,w 0 ,w 1 R and is the sigmoid function We are more likely to encounter the problem of vanishing gradients if we initialize the parameters ( w 0 , w 1 ) (w 0 ,w 1 ) to very large values Choice 1 of 2 True Choice 2 of 2 False

Question

We are using gradient descent to learn the parameters of a simple neural network for binary classification  f ( x )   ( w 1 x   w 0 ) f(x) (w 1 x w 0 ), where x , w 0 , w 1 R x,w 0 ,w 1 R and is the sigmoid function  We are more likely to encounter the problem of vanishing gradients if we initialize the parameters ( w 0 , w 1 ) (w 0 ,w 1 ) to very large values  Choice 1 of 2 True Choice 2 of 2 False

SolutionInn · Accepted Answer

The Answer is in the image, click to view ...

Question: We are using gradient descent to learn the parameters of a simple neural network for binary classification: f ( x ) = ( w 1

Step by Step Solution