Question: Consider the quadratic polynomial f(x) = ax2, with free parameter a > 0. Clearly, this quadratic has a global minimum at x = =

Consider the quadratic polynomial f(x) = ax2, with free parameter a > 0. Clearly, this quadratic has a global minimum at x = = 0. a. Set a = 1. Perform three iterations of gradient descent by hand with initial guess xo learning rate = 1/4. Does gradient descent appear to be converging? b. Set a = 1. Perform three iterations of gradient descent by hand with initial guess xo learning rate = 1. Does gradient descent appear to be converging? - 1 and = 1 and c. According to Theorem 6.4.1 in the weekly lecture notes, the learning rate y must be strictly less than 1/L for gradient descent to converge, where L is the maximum of |"(x)|. Use this theorem to explain why (a) converged and (b) did not. d. According to Theorem 6.4.2 in the weekly lecture notes, - |f(xk) f(x)| (xo-x*) 2yk where x is the exact x-coordinate of the minimum, x is the kth iterate of gradient descent, xo is the initial guess, and y is the learning rate. The expression on the LHS of this inequality can be thought of as the error between the exact minimum of and what gradient descent predicts is the minimum of after k iterations. Given f(x) = x, xo = 1, x = 0, and Y = 1/4, how many iterations of gradient descent are necessary to guarantee that |f(xk) f(xx)| < 10-15? Hint: Find the exact k such that the RHS of the inequality equals 10-15.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!

The fact that we encourage young boys to be aggressive and competitive while actively discouraging young girls from engaging in the same behaviour gives support to which theoretical perspective on...

CANMNMM January of this year. (a) Each item will be held in a record. Describe all the data structures that must refer to these records to implement the required functionality. Describe all the...

Q1. You have identified a market opportunity for home media players that would cater for older members of the population. Many older people have difficulty in understanding the operating principles...

,,,,,7 7,7, *****5* ,nnuiin, 7,,i, * 7,,ii, iniid, * ,,,,,, Answer ALL Questions. Work must be shown to receive full credit. Question #1: Consider the quadratic function: f (x) = x3 2x +3 . a....

To practice basic Python programming of control and data structures. Spend time testing and debugging your work. Get experience looking up background material on the Internet. For the 1st item, you...

am stuck please I need help thankyou (1) Given the following system of linear equations, which depends on a parameter a E R. x + 2y - 3: 3r - y + 52 Arty + (al - 14)= = 0+2 (a) Classify the system of...

Dalia Isabel Gonzalez rocha math1010spring2016-90 3 WeBWorK assignment number WBWK-12 is due : 04/11/2016 at 11:59pm MDT. These are Quadratic Equations. Solve them. The early bird gets the worm, but...

APPLICATIONS: QUADRATICS This is a series of problems in which you will apply the ideas you worked with this week. You should print out the handout for this problem write your work and answers here...

When approximating a function f using a Taylor polynomial, we use information about f and its derivatives at one point. An alternative approach (called interpolation) uses information about f at...

Economics questions; (1) Consider the following system of linear equations, c + 3y + 2= =1 3r + y + 22 =b cty taz =2b where a, be R are parameters. (a) Classify the system according to the values of...

Suppose that a component in a satellite system has a useful life described by an exponential distribution with a failure rate of 10-5 hours, which is = 10-5. How many years will it have to last...

Weyerhaeuser, the forest products producer, traded at $42 at the beginning of 1996. Beta services typically place its beta at 1.0 with a market risk premium of 6 percent. The risk free rate at the...

It is characterized by a belief that international politics are tragic in the sense that normative and ethical concerns cannot change a system of incessant competition and threat of open hostilities...

The comparative balance sheets of Nike, Inc. are presented here. Instructions(a) Prepare a horizontal analysis of the balance sheet data for Nike, using 2008 as a base.(Show the amount of increase or...

Comprehensive Accounting Change and Error Analysis Problem Botticelli Inc. were organized in late 2008 to manufacture and sell hosiery. At the end of its fourth year of operation, the company has...

Presented below is information related to Hurley Co. for the month of January 2012. Instructions(a) Prepare the necessary adjusting entry for inventory.(b) Prepare the necessary closingentries....

In the circuit shown in Fig the current through the 12.0-V battery is measured to be 70.6 mA in the direction shown. What is the terminal voltage V ab of the 24.0-V battery? 70.6 mA 20.0 + 24.0 V...

Find the emfs E1 and E2 in the circuit of Fig, and find the potential difference of point b relative to point a. 1.00 N 20.0 V 6.00 N 1.00 A|| 1.00n & ww 4.00 0 ww 2.00 A || 1.000 Ez 2.00 0

Sustainability in business generally addresses two main categories: The effect business has on the environment The effect business has on society All of the above None of the above

1. Listen to the audios on the CRE Web site. Pick one and say why it could be eliminated. 2. Create an original problem based on Chapter 7 material. 3. Design an experiment for the undergraduate...

Estimate the reactor volumes of the two CSTRs and the PFR shown in the photo in Figure 2-9. Figure 2-9

Compound A undergoes a reversible isomerization reaction, A B, over a supported metal catalyst. Under pertinent conditions, A and B are liquid, miscible, and of nearly identical density; the...

=+62. In Exercise 14 (Section 7.2), the sample mean and standard deviation of the dye-layer density of aerial photographs of 69 forest trees were found to be 1.028 and .163, respectively. Because the...

=+generate a random sample of 69 observations from a normal distribution whose mean and standard deviation are 1.028 and .163, respectively. If necessary, after obtaining the sample, the data are...

=+. We now ask, For what value of is the observed sample most likely to have occurred? That is, we want to find the value of that maximizes the probability 3 (1 2 ) 7