Question: Question 6 [ 2 pts ] : To reduce the risk of neural network overfitting, one solution is to add a penalty term for the

Question 6[2 pts]: To reduce the risk of neural network overfitting, one solution is to add a
penalty term for the weight magnitude. For example, add a term to the squared error E that
increases with the magnitude of the weight vector. This causes the gradient descent search to
seek weight vectors with small magnitudes, thereby reduces the risk of overfitting. Given a single
layer neural network with M output nodes (i.e., no hidden layer), assuming the defined squared
error E is defined by
E(w)=12n=1Njinoutputnodes?[dj(n)-oj(n)]2+i?,jwi,j2
Where N denotes the total number of training instances, dj(n) denotes the desired output of
the nth instance from the jth output node. oj(n) is the actual output of the nth instance observed
from the jth output node. wi,j is the ith weight value of the jth output node. Assuming an output
node j is using sigmoid activation function (vj)=11+e-avj
Calculate partial derivative of E(w) to weight wi,j[1pt]
Derive the weight updating rule for the ith weight of output node j.[Hint: use gradient
descent][1 pt]
Question 6 [ 2 pts ] : To reduce the risk of

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!