Question: please attention : only answer question (a) and (b) 3. a) Derive the update rule for the weights in the output layer of a neural
please attention : only answer question (a) and (b)

3. a) Derive the update rule for the weights in the output layer of a neural network using gradient descent rule. Assume that the sigmoid function is used as an activation function, the quadratic loss as the error function and L1 regularisation is applied. b) Assume the network's error function is Eo. How is it modified when L2 regularisation is applied? Describe how this type of regularization works and what is the difference with LI regularisation. c) Assume that you wish to train a classifier on a large dataset. How would you estimate its generalization performance and optimize its pararneters? Describe briefly the procedure that you would follow d) Compute the classification rate for the given confusion matrix. Do you think the classification rate is a suitable performance measure in this case? Explain your reasoning and the alternatives. Class 1 . Predicted Class 2 - Class 3 Class 1 - Actual 1000 Class 2 - Actual20 Class 3 - Actual Predicted 100 0 10 Predicted 50 10 0 10 e four parts carry, respectively, 40%, 20%, 20%, 20% of the marks. 3. a) Derive the update rule for the weights in the output layer of a neural network using gradient descent rule. Assume that the sigmoid function is used as an activation function, the quadratic loss as the error function and L1 regularisation is applied. b) Assume the network's error function is Eo. How is it modified when L2 regularisation is applied? Describe how this type of regularization works and what is the difference with LI regularisation. c) Assume that you wish to train a classifier on a large dataset. How would you estimate its generalization performance and optimize its pararneters? Describe briefly the procedure that you would follow d) Compute the classification rate for the given confusion matrix. Do you think the classification rate is a suitable performance measure in this case? Explain your reasoning and the alternatives. Class 1 . Predicted Class 2 - Class 3 Class 1 - Actual 1000 Class 2 - Actual20 Class 3 - Actual Predicted 100 0 10 Predicted 50 10 0 10 e four parts carry, respectively, 40%, 20%, 20%, 20% of the marks
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
