2. Which properties of Lasso path generalize to other loss functions? Recall we showed the optimality...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
2. Which properties of Lasso path generalize to other loss functions? Recall we showed the optimality conditions for a Lasso solution: where as we noted in class, k B(X)=0 = = X(Y - XB(A)) = sgn(B(A)) B(A) k = 0 |X (Y XB(A))| < 2 < NE (1) 2 Vk |X (Y XB(A))| (2) (3) 2' X(YXB(A)) ARSS(B) |B=B(X) is the derivative of the loss function. We noted in class the following properties of the set of solutions {B(A) : 0 < }: i All the variables in the solution are "highly correlated" with the current residual from (1) above, and all the variables with zero coefficients are less correlated" with the current residual from (23) above. ii The solution path {(A) : 0 x 0} as a function of A can be described by a collection of "breakpoints" > 1 > 2 > ... > K >0 such that the set Ak of active variables with non-zero coefficients is fixed for all solutions B(A) with Ak k+1. iii B(A) is a piecewise linear function, in other words, for in this range we have: B(A) = (Ak) + Uk(Ak ), for a vector vk we explicitly derived in class. Assume now that we want to build a different type of model with a different convex and infinitely differentiable loss function, say a logistic regression model for a binary classification task, and add lasso penalty to that: B(X) n = arg min log {1+ exp{yx{{B}} + \||B||1 i=1 We would like to investigate which of the properties above still holds for the solution of this problem. (a) Using simple arguments about derivatives and sub-derivatives as we used in class for the quadratic loss case, argue that that three conditions like (1)-(3) can be written for this case too, with the appropriate derivative replacing the empirical correlation. Derive these expressions explicitly for the logistic case. (b) Explain clearly why this implies that properties (i), (ii) still hold (for (ii), you may find the continuity of the derivative useful). (c) Does the piecewise linearity still hold? A clear intuitive explanation is sufficient here. Hint: Consider how we obtained the linearity for squared loss in A in class by decomposing the correlation vector XT (Y - X) = XTY XTX. 2. Which properties of Lasso path generalize to other loss functions? Recall we showed the optimality conditions for a Lasso solution: where as we noted in class, k B(X)=0 = = X(Y - XB(A)) = sgn(B(A)) B(A) k = 0 |X (Y XB(A))| < 2 < NE (1) 2 Vk |X (Y XB(A))| (2) (3) 2' X(YXB(A)) ARSS(B) |B=B(X) is the derivative of the loss function. We noted in class the following properties of the set of solutions {B(A) : 0 < }: i All the variables in the solution are "highly correlated" with the current residual from (1) above, and all the variables with zero coefficients are less correlated" with the current residual from (23) above. ii The solution path {(A) : 0 x 0} as a function of A can be described by a collection of "breakpoints" > 1 > 2 > ... > K >0 such that the set Ak of active variables with non-zero coefficients is fixed for all solutions B(A) with Ak k+1. iii B(A) is a piecewise linear function, in other words, for in this range we have: B(A) = (Ak) + Uk(Ak ), for a vector vk we explicitly derived in class. Assume now that we want to build a different type of model with a different convex and infinitely differentiable loss function, say a logistic regression model for a binary classification task, and add lasso penalty to that: B(X) n = arg min log {1+ exp{yx{{B}} + \||B||1 i=1 We would like to investigate which of the properties above still holds for the solution of this problem. (a) Using simple arguments about derivatives and sub-derivatives as we used in class for the quadratic loss case, argue that that three conditions like (1)-(3) can be written for this case too, with the appropriate derivative replacing the empirical correlation. Derive these expressions explicitly for the logistic case. (b) Explain clearly why this implies that properties (i), (ii) still hold (for (ii), you may find the continuity of the derivative useful). (c) Does the piecewise linearity still hold? A clear intuitive explanation is sufficient here. Hint: Consider how we obtained the linearity for squared loss in A in class by decomposing the correlation vector XT (Y - X) = XTY XTX.
Expert Answer:
Posted Date:
Students also viewed these mathematics questions
-
QUESTION Mahkota Oil Sdn Bhd has been involved in the palm oil processing industry for several years. It has a steady market and competing successfully with other more expensive vegetable oil...
-
Flint Hills Park is a private camping ground near the Lathom Peak Recreation Area. It has compiled the following financial information as of December 31, 2014. Instructions (a) Determine Flint Hills...
-
In a study of the effects of stress on illness, a researcher tallied the number of colds people contracted during a 6-month period as a function of the amount of stress they reported during that same...
-
Using the Double-Declining Balance Method The Peete Company purchased an office building for \(\$ 4,500,000\). The building had an estimated useful life of 25 years and an expected salvage value of...
-
Jantzen Manufacturing Inc. operates the Patio Furniture Division as a profit center. Operating data for this division for the year ended December 31, 2010, are as shown below. In addition, Jantzen...
-
Suppose the fraction of white sheep in Herd A is p 1 and the fraction of white sheep in Herd B is p 2 . State the alternative hypothesis for a test to determine if Herd A has a higher proportion of...
-
You are the owner of Cach, a chain of women's clothing boutiques. Your state has a sales tax of 5%, and your city has an additional sales tax of 1.5%. Each quarter you are responsible for making...
-
Due to the pandemic, people have been afraid to go to hospitals even for non-COVID-related illnesses. They think there's a high possibility of getting infected with COVID-19 when they are in hospital...
-
Choose an existing design (A product or service, could be an object, interactive design or a digital tool) and apply the human-centered design methodology to analyze the problem that was designed to...
-
Alpine Ski Resort sold 2,500 season passes at $800 before the start of the 2024 ski season. The resort recognizes the revenue equally over the 6-month season which runs from November through April....
-
Pace Corporation in Cookeville, Tennessee bought production equipment 2 years ago for $38,000. The equipment was expected to last for 5 years and the salvage value was estimated to be $4,000 at the...
-
The human ear canal is about 2.8 cm long and can be regarded as a tube open at one end and closed at the eardrum. What is the fundamental frequency around which we would expect hearing to be most...
-
On June 30, 2020, Cullumber Limited issued $3 million of 20-year, 10% bonds for $3,593,786, which provides a yield of 8%. The company uses the effective interest method to amortize any bond premium...
-
Suppose the spot and six-month forward rates on the Norwegian krone are Kr 5.78 and Kr 5.86, respectively. The annual risk-free rate in the United States is 3.8 percent, and the annual risk-free rate...
Study smarter with the SolutionInn App