1 0 points We have mainly focused on squared loss, but there are other interesting losses in machine learning Consider the following loss function which we denote by ( z ) max ( 0 , z ) Let S be a training set ( x 1 , y 1 ) , dots, ( x m , y m ) where each x i i n R n and y i i n 1 , 1 Consider running stochastic gradient descent ( SGD ) to find a weight vector w that minimizes 1 m i 1 m ( y i w T x i ) Explain the explicit relationship between this algorithm and the Perceptron algorithm Recall that for SGD , the update rule when the i t h example is picked at random is w n e w w o l d g r a d ( y i w T x i ) Note You do not need to be overly concerned about the discontinuity at ( 0 ) , so you can ignore this when calculating the gradient for this problem

The Answer is in the image, click to view ...

Question: [ 1 0 points ] We have mainly focused on squared loss, but there are other interesting losses in machine learning. Consider the following loss

[10

points

]

We have mainly focused on squared loss, but there are other interesting losses in machine learning. Consider the following loss function which we denote by

(z) =

max

(0, - z) .

Let

S

be a training set

(x^{1}, y^{1}),

dots,

(x^{m}, y^{m})

where each

x^{i} i n R^{n}

and

y^{i} i n {- 1, 1} .

Consider running stochastic gradient descent

(

SGD

)

to find a weight vector

w

that minimizes

\frac{1}{m}_{i = 1}^{m} (y^{i} * w^{T} x^{i}) .

Explain the explicit relationship between this algorithm and the Perceptron algorithm. Recall that for SGD

,

the update rule when the

i^{t h}

example is picked at random is

w_{n e w} = w_{o l d} - g r a d (y^{i} w^{T} x^{i})

Note: You do not need to be overly concerned about the discontinuity at

(0),

so you can ignore this when calculating the gradient for this problem.

[ 1 0 points ] We have mainly focused on squared

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Please explain in detail so I can comprehend, I'm having problems understanding. Thank you We have mainly focused on squared loss, but there are other interesting losses in machine learning. Consider...

We have mainly focused on squared loss, but there are other interesting losses in machine learning. Consider the following loss function which we denote by \ phi ( z ) = max ( 0 , z ) . Let S be a...

We have mainly focused on squared loss, but there are other interesting losses in machine learning. Consider the following loss function which we denote by 0 ( z ) = max ( 0 , - 2 ) . Let S be a...

We have mainly focused on squared loss, but there are other interesting losses in machine learning. Consider the following loss function which we denote by ( z ) = max ( 0 , z ) . Let S be a training...

We have mainly focused on squared loss, but there are other interesting losses in machine learning. Consider the following loss function which we denote by \ phi ( z ) = max ( 0 , z ) . Let S be a...

2 . We have mainly focused on squared loss, but there are other interesting losses in data - mining. Consider the following loss function which we denote by 0 ( 2 ) = max ( 0 , - 2 ) . Let S be a...

A research design details all of the decisions made in the preceding stages of the research process ans the rationale behind each decision. True False

Discuss the significance that the concept of the relevant range has to break-even analysis.

QUESTION 2 You have recently been promoted to Senior Manager of Issues Matters and Associates, a firm of Chartered Accountants. As part of your job description, you are to handle two clients in a...

The graph below g(x) is a transformation of the graph of f(x) = sin(x). There are no reflections in this graph from f(x).a. What is the period of g(x)?b. What is the amplitude of g(x)?c. What is the...