Question: 1 . ( 4 0 points ) The weight decay regularizer is also called L 2 regularizer, since wT w is the square of the
points The weight decay regularizer is also called L regularizer, since wT w is the square
of the norm of the weight vector w
qPd
i w
i Another common regularizer is called
L regularizer, since norm w
Pd
iwi is used as the regularizer.
Below are the definitions of the two regularizations:
L regularization: Eaugw Einw w
L regularization: Eaugw Einw wT w
a points out of points Answer LFD Problem
b points out of points Similar to Problem derive the update rule of gradient
descent for minimizing the augmented error with L regularizer.
Note that the gradient of norm is not welldefined at To address this issue, we can
utilize the subgradient idea defined as follows:
wi
w
if wi
any value in if wi
if wi
When applying these regulerizations to linear regression, they are called Ridge Regression L regularizer and
Lasso Regression L regularizer respectively.
To simplify the discussion, we let
wi
w when wi Please write down the
update rule of gradient descent for L regularization. You can define a sign function
that returns when the input is positive, zero, negative
Truncated gradient for part c: In Lasso regression linear regression with L regularization
one nice property is that it tends to learn a weight vector with many s
However, if we perform gradient descent on the augmented error with L regularization,
it wont lead to this nice property partly due to the notwelldefined behavior of
subgradient. In this homework, you will implement truncated gradient an approach
trying to maintain the nice property of L regularizations, as described below.
Let wt wt Einwt be the update rule of gradient descent without
regularization. The update rule for L regularization that you derived should be in the
form of
wt wt additional term
The additional term represents the effect of L regularization compared with no regularization.
Truncated gradient works as follows: At each step t you first perform the
update and obtain wt Then for each dimension i if wit and wit have
different signs and when wit we set the update wit to ie we truncate
the update if the additional term makes the new weight change signs
c points out of points Update your implementation of logistic regression in HW
to include the L and L regularizers use truncated gradient for L regularizer and
regular gradient descent for L regularizers Conduct the following experiment and
include the results in your report. Also submit the updated python implementation
feel free to update the function headers andor define new functions
You will work with digits dataset, classifying whether a digit belongs to labeled
as or labeled as Please download the preprocessed data check
the label format and make sure you are working with the labels on Canvas.
Examine different for both L and L regularizations.
Train your models on the training set. For each trained model, report the
classification error on the test set and the number of s in your learned weight vector.
Describe your observations and the property of the L regularizer when coupled
with truncated gradient
For the other parameters, please use the following. Normalize the features. Set learning
rate The maximum number of iterations is Terminate learning if the
magnitude of every element of the gradient of Ein is less than When calculating
classification error, classify the data using a cutoff probability of
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
