Question: 1. (Gradient descent) When the error on a training example d is defined as (see p. 101 in the textbook) Ea(w) = Ea(w)= 1
1. (Gradient descent) When the error on a training example d is defined as (see p. 101 in the textbook) Ea(w) = Ea(w)= 1 ke outputs then the weights for the output units need to be updated by (see p. 103, formula (4.27)) Awji = n(tj - oj)o; (1-oj)xji One method for preventing the neural networks' weights from overfitting is to add a regular- ization term to the error that increases with the magnitude of the weight vector. This causes the gradient descent search to seek weight vectors with small magnitudes, thereby reducing the risk of overfitting. One way to do this is to redefine E from p. 101 of the text book as (tk - Ok.) (1-0)2 - 0x0) + (1 - 2 - .) ke outputs (b) Explain how you obtained this: Derive the corresponding gradient descent rule for the output units. In other words, say what the new formula for Awj; is, and show how you derived it. (a) Formula for Aw Awji =
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
