Question: In class, we introduced the weight decay regularizer, which adds a loss term w22 where w is a vector comprising all of the parameters of

 In class, we introduced the weight decay regularizer, which adds a

In class, we introduced the weight decay regularizer, which adds a loss term w22 where w is a vector comprising all of the parameters of the model and is a scalar hyperparameter. An alternative is L1 regularization, which adds the loss term w. Let the model's original, unregularized loss be L, so that the regularized loss is L+w. 1. What is the gradient of the L1 regularization term with respect to the parameters w ? For simplicity, you can assume the model has only one parameter, so that w is a scalar. 2. What is the parameter update under normal gradient descent for L+w ? Since you don't know L, you can include a wL term in your

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!