Question: We are using gradient descent to learn the parameters of a simple neural network for binary classification: f ( x ) = ( w 1
We are using gradient descent to learn the parameters of a simple neural network for binary classification: f ( x ) = ( w 1 x + w 0 ) f(x)=(w 1 x+w 0 ), where x , w 0 , w 1 R x,w 0 ,w 1 R and is the sigmoid function. We are more likely to encounter the problem of vanishing gradients if we initialize the parameters ( w 0 , w 1 ) (w 0 ,w 1 ) to very large values. Choice 1 of 2:True Choice 2 of 2:False
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
