Question: Please use the hint! Question 1 (5 marks). Prove the strongly convex case of Theorem 4.9: Suppose F is 7-strongly convex and L-smooth with minimizer

Please use the hint! Question 1 (5 marks). Prove

Please use the hint!

Question 1 (5 marks). Prove the strongly convex case of Theorem 4.9: Suppose F is 7-strongly convex and L-smooth with minimizer w*, and we run gradient descent (Wk+1 = wk akVF(wk)) with ak = 1/L starting from any wo E Rd. Show that F(wk) F(w*) 0. Hint: first show that F(w*) > F(w) Z || 1F(w) ||for all w by minimizing each side of (4.10) with respect to v. Lemma 4.6. If F is continuously differentiable, then F is convex if and only if F lies on or above any tangent line: F(v) > F(u) + VF(u)? (v u), Vu, 06 Rd. (4.9) Also, F is y-strongly convex if and only if F(v) > F(u) + VF(u)"(v u) + 2llo u|3; Vu, ve Rd. (4.10) If F is twice continuously differentiable, then F is convex if and only if V2F(w) is positive semidefinite for every w E Rd. Also, F is y-strongly convex if and only if V2F(w) yIdxd Gradient Descent The simplest optimization algorithm is called gradient descent (or steepest descent). Intuitively, we note that: If VF(w)T's > 0, then s is (locally) a direction of increasing F (that is, F(w + as) > F(w) for a > 0 sufficiently small); If VF(w)?s = 0, then s is a direction of constant F; and If VF(w)Is 0 (representing the fact that - VF(w) is only the best direction locally, not globally). The simplest choice of step sizes is as = 1/L for all k (when F is L-smooth). In that case, we use (4.12) and (4.8) to obtain F(wk 04 VF(wk)) 0. Hint: first show that F(w*) > F(w) Z || 1F(w) ||for all w by minimizing each side of (4.10) with respect to v. Lemma 4.6. If F is continuously differentiable, then F is convex if and only if F lies on or above any tangent line: F(v) > F(u) + VF(u)? (v u), Vu, 06 Rd. (4.9) Also, F is y-strongly convex if and only if F(v) > F(u) + VF(u)"(v u) + 2llo u|3; Vu, ve Rd. (4.10) If F is twice continuously differentiable, then F is convex if and only if V2F(w) is positive semidefinite for every w E Rd. Also, F is y-strongly convex if and only if V2F(w) yIdxd Gradient Descent The simplest optimization algorithm is called gradient descent (or steepest descent). Intuitively, we note that: If VF(w)T's > 0, then s is (locally) a direction of increasing F (that is, F(w + as) > F(w) for a > 0 sufficiently small); If VF(w)?s = 0, then s is a direction of constant F; and If VF(w)Is 0 (representing the fact that - VF(w) is only the best direction locally, not globally). The simplest choice of step sizes is as = 1/L for all k (when F is L-smooth). In that case, we use (4.12) and (4.8) to obtain F(wk 04 VF(wk))

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!

(a) In SystemVerilog, what is the difference between: (i) The ternary operator ? and if...then...else statements? [2 marks] (ii) always_ff and always_comb? [2 marks] (iii) Blocking, non-blocking and...

Briefly describe ASCII and Unicode and draw attention to any relationship between them. [3 marks] (b) Briefly explain what a Reader is in the context of reading characters from data. [3 marks] A...

ANSI-SPARC6 Programming Language Compilation Write notes on each of the following topics: (a) the implementation of labels and jumps in a recursive, block structured programming language [7 marks]...

Give Correct ANSWERS Human-Computer Interaction (a) If you had been one of the original inventors of the WIMP interface, and engineers on the technical team had been sceptical about the advantages...

A discrete sequence {xn} can be converted into a continuous representation x(t) = ts X n= (t n ts) xn, where ts is the sampling period. (a) State two characteristic properties of Dirac's function. [2...

answer the question clearly You are building a flight-control system for which a convincing safety case must be made. Would you assign the tasks of safety requirements engineering, test case...

Set Student Name: 1. Describe the relationship between two variables that have a correlation coefficient value: a. Near -1 b. Near 0 c. Near 1 2. Data was collected where a weightlifter was asked to...

Instuctor's Annotated Edition TENTH EDITION Understandable Statistics Concepts and Methods Charles Henry Brase Regis University Corrinne Pellillo Brase Arapahoe Community College Australia Brazil...

re Regular Languages and Finite Automata (a) Let L be the set of all strings over the alphabet {a, b} that end in a and do not contain the substring bb. Describe a deterministic finite automaton...

Deduce the type of map3iter and explain in words what the function does. Illustrate your answer by considering the call map3iter g [1, 1, 1, 1, 1, 1]; in an environment in which g is defined as...

Data Table Selected income statement data for the current year: Power Internet Net sales (all on credit) 605,000 S 515,000 Cost of goods sold. 453,000 386,000 O Requirements - X Income from...

The region bounded by y = and y = x - y.

The most widespread use of cost information is in budgeting. Question 1 Select one: True False

What is an example of constructive feedback? provide reference