Question: Proximal Gradient Descent The gradient descent algorithm cannot be directly applied since the objective function is non- differentiable. Discuss why the objective function is non-differentiable.

Proximal Gradient Descent

The gradient descent algorithm cannot be directly applied since the objective function is non-

differentiable. Discuss why the objective function is non-differentiable.

Proximal Gradient Descent The gradient descent algorithm cannot be directly applied since

Problem 3. Proximal Gradient Descent Consider solving the following problem where Rnxd is the feature matrix (each row is a feature vector), y e Rn is the label vector, llw11- lwil and > 0 is a constant to balance loss and regularization. This is known as the Lasso regression problem and (10 pt) The gradient descent algorithm cannot be directly applied since the objective function is non- (30 pt) In the class we showed that gradient descent is based on the idea of function approximation. To form our goal is to derive the "proximal gradient method" for solving this differentiable. Discuss why the objective function is non-differentiable an approximation for non-differentiable function, we split the differentiable part and non-differentiable part. Let g(w)Xw yl2, as discussed in the gradient descent lecture we approximate g(w) by In each iteration of proximal gradient descent, we obtain the next iterate (wi+1) by minimizing the following approximation function: wt +1 = arg ming(w) + 1wlla Derive the close form solution of wt +1 given wt, g(w), , . What's the time complexity for one proximal gradient descent iteration? Problem 3. Proximal Gradient Descent Consider solving the following problem where Rnxd is the feature matrix (each row is a feature vector), y e Rn is the label vector, llw11- lwil and > 0 is a constant to balance loss and regularization. This is known as the Lasso regression problem and (10 pt) The gradient descent algorithm cannot be directly applied since the objective function is non- (30 pt) In the class we showed that gradient descent is based on the idea of function approximation. To form our goal is to derive the "proximal gradient method" for solving this differentiable. Discuss why the objective function is non-differentiable an approximation for non-differentiable function, we split the differentiable part and non-differentiable part. Let g(w)Xw yl2, as discussed in the gradient descent lecture we approximate g(w) by In each iteration of proximal gradient descent, we obtain the next iterate (wi+1) by minimizing the following approximation function: wt +1 = arg ming(w) + 1wlla Derive the close form solution of wt +1 given wt, g(w), , . What's the time complexity for one proximal gradient descent iteration

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

MATHEMATICS FOR MACHINE LEARNING Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong Contents Foreword 1 Part I Mathematical Foundations 9 1 Introduction and Motivation 11 1.1 Finding Words for...

CS 7641 CSE/ISYE 6740 Homework 3 Le Song Deadline: 11/07 Mon, 11:55pm Submit your answers as an electronic copy on T-square. No unapproved extension of deadline is allowed. Zero credit will be...

Which steps are involved in proximal gradient descent? Select all that apply. Group of answer choices solving the optimization problem using a single step (closed form solution) gradient descent...

How can I solve the below problems? Question 2. Proximal gradient method (30 points) In this module, we have learned the proximal gradient method (PGM) and we know that it has o () convergence rate....

In the lectures, we introduced Gradient Descent, an optimization method to find the minimum value of a function. In this problem we try to solve a fairly simple optimization problem: min f(x) = x2...

Solve the following optimization problem: argmin - Zizi(yi - Tr(X;@)) + alloll. (obj 1) where yi are given measurement points and Xis are known matrices. The nuclear norm is defined as: |10||+ = Et1...

1 - Packages First, import all the packages that you will need for this assignment. Some of the packages are given below. - numpo is the fundamental package for working with matrices in Python. -...

Q2. (25 points) Consider two functions fix. y) = (x y + xy f(r,y) = (1 - (y - 4) +35[(x +6) (-4)2,2 (a) Run gradient descent on fi with starting point (r.y) = (2.3). T = 20 steps and n = 15. Report...

Task 2: Multinomial logistic regression (softmax classifier) on MNIST dataset In this task, we will implement the generalization of binary logistic regression to classify multiple classes (10 digits)...

From the following, find out the amount of subscriptions to be included in the Income & Expenditure Account for the year ended 31st March 2009. Subscription were received during the year 2008 - 2009...

Poplar Inc.s corporate headquarters, plants, and warehouses are located in Vermont. Until this year, all of Poplars sales have been made in Vermont. Poplars first out-of-state sale of $ 1 million...

Calculate 2 0 1 7 and 2 0 1 8 : Explain in detail how its bad for business compared to other companirs. For 2 0 1 7 and 2 0 1 8 Total Assets Turnover - ? Delot Ratio - ? Return on Equity - ? Show all...

3. Consider a 5-year, 9% coupon bond with a yield to maturity of 8% and a par value of $100. This bond is currently priced at $104.05. Compute the approximate duration for this bond for a 20bps...

Determine the labor demand for workers in various job categories. page 186

Understand the importance of job analysis in strategic human resource management. page 161

Discuss how to align a companys strategic direction with its human resource planning. page 185