Question: 1 . ( 4 0 points ) The weight decay regularizer is also called L 2 regularizer, since wT w is the square of the

1.(40 points) The weight decay regularizer is also called L2 regularizer, since wT w is the square
of the 2-norm of the weight vector w2=
qPd
i=0 w2
i . Another common regularizer is called
L1 regularizer, since 1-norm (w1=
Pd
i=0|wi|) is used as the regularizer.
Below are the definitions of the two regularizations1:
L1 regularization: Eaug(w)= Ein(w)+ w1
L2 regularization: Eaug(w)= Ein(w)+ wT w
(a)(10 points out of 40 points) Answer LFD Problem 4.8.
(b)(10 points out of 40 points) Similar to Problem 4.8, derive the update rule of gradient
descent for minimizing the augmented error with L1 regularizer.
Note that the gradient of 1-norm is not well-defined at 0. To address this issue, we can
utilize the subgradient idea defined as follows:
wi
w1=
+1 if wi >0
any value in [1,1] if wi =0
1 if wi <0
1When applying these regulerizations to linear regression, they are called Ridge Regression (L2 regularizer) and
Lasso Regression (L1 regularizer) respectively.
1
To simplify the discussion, we let
wi
w1=0 when wi =0. Please write down the
update rule of gradient descent for L1 regularization. (You can define a sign() function
that returns +1,0,1 when the input is positive, zero, negative).
Truncated gradient (for part (c)): In Lasso regression (linear regression with L1 regularization),
one nice property is that it tends to learn a weight vector with many 0s.
However, if we perform gradient descent on the augmented error with L1 regularization,
it wont lead to this nice property partly due to the not-well-defined behavior of
subgradient. In this homework, you will implement truncated gradient [1], an approach
trying to maintain the nice property of L1 regularizations, as described below.
Let w(t +1)w(t) Ein(w(t)) be the update rule of gradient descent without
regularization. The update rule for L1 regularization that you derived should be in the
form of
w(t +1)w(t +1)+ additional term
The additional term represents the effect of L1 regularization compared with no regularization.
Truncated gradient works as follows: At each step t, you first perform the
update and obtain w(t +1). Then for each dimension i, if wi(t +1) and wi(t +1) have
different signs and when wi(t+1)=0, we set the update wi(t+1) to 0(i.e., we truncate
the update if the additional term makes the new weight change signs).2
(c)(20 points out of 40 points) Update your implementation of logistic regression in HW2
to include the L1 and L2 regularizers (use truncated gradient for L1 regularizer and
regular gradient descent for L2 regularizers). Conduct the following experiment and
include the results in your report. Also submit the updated python implementation
(feel free to update the function headers and/or define new functions).
You will work with digits dataset, classifying whether a digit belongs to {1,6,9}(labeled
as 1) or {0,7,8}(labeled as +1). Please download the pre-processed data (check
the label format and make sure you are working with the +1/1 labels) on Canvas.
Examine different =0,0.0001,0.001,0.005,0.01,0.05,0.1 for both L1 and L2 regularizations.
Train your models on the training set. For each trained model, report (1) the
classification error on the test set and (2) the number of 0s in your learned weight vector.
Describe your observations and the property of the L1 regularizer (when coupled
with truncated gradient).
For the other parameters, please use the following. Normalize the features. Set learning
rate =0.01. The maximum number of iterations is 104. Terminate learning if the
magnitude of every element of the gradient (of Ein) is less than 106. When calculating
classification error, classify the data using a cutoff probability of 0.5.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!