Question: Q 1 . [ 5 points ] The optimization problem of training logistic regression for the binary classification problem, i . e . , Y

1 . [5

points

]

The optimization problem of training logistic regression for the binary classification problem, i

.

., Y = {0, 1},

uses the cross

-

entropy

(

)

loss which is defined as

w_{C E}^{* *} = a r g m i n_{w} l_{C E} (w) = a r g m i n_{w} - \frac{1}{n}_{i = 1}^{n} y_{i} l o g h a t (y)_{i} + (1 - y_{i}) l o g (1 -

hat

(y)_{i}),

where hat

(y) = \frac{1}{1 + e x p (- w^{T} x)} .

One may believe that we could have used mean squared error

(

MSE

)

as the loss function instead of the CE loss. In this case, the optimization problem for the logistic regression can be described as

w_{M S E}^{* *} = a r g m i n_{w} l_{M S E} (w) = a r g m i n_{w} \frac{1}{n}_{i = 1}^{n} | | y_{i} -

hat

(y)_{i} | |_{2}^{2} .

Here, hat

(y) = \frac{1}{1 + e x p (- w^{T} x)} .

The goal of this exercise is to compare the CE loss and the MSE loss when training a logistic regression model and to understand why we should use the CE over the MSE.

(

) [2

points

]

Compute the first order and the second order derivatives of

l_{C E} (w) .

(

) [2

points

]

Compute the first order and the second order derivatives of

l_{M S E} (w) .

(

) [1

point

]

Using the seconder order derivatives, which are computed in

(

)

and

(

),

explain why we should prefer the cross

-

entropy loss over the MSE loss for training a logistic regression model.

(

Hint

.

Use the second derivative and check the convexity.

)

(

) [1

point

]

Given the gradients computed in

(

),

derive the GD algorithm for optimization problems in equations

(1) .

Q1.[5 points] The optimization problem of training logistic regression for the

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Q 1 . [ 5 points ] The optimization problem of training logistic regression for the binary classification problem, i . e . , Y = { 0 , 1 } , uses the cross - entropy ( CE ) loss which is defined as w...

The optimization problem of training logistic regression for the binary classification problem, i . e . , y = { 0 , 1 } , uses the cross - entropy ( CE ) loss which is defined as

help pleassse 5. Linear/Logistic Regression [10 points] (a) [5 points] In lecture, we considered using linear regression for binary classification on the targets {0, 1}. Here, we use a linear model y...

When you reply to two your classmates' posts, respectfully agree/disagree to the post and explain why, while offering an opinion with an example to support it and references. Be sure to do one or...

Solve the following questions correctly.,,, Question 1 Question 1 Which of the following is NOT TRUE about Multiple Regression a. Assumes a linear relation with each variable. b. Can only be used...

You have been tasked with performing an analysis on customers of a credit card company. Specifically, you will be developing a classification model to classify whether or not specific customers will...

we showed that the Logistic Regression for binary classification boils down to solving the following optimization problem (training error) over n training samples: n f(w) = Clog (1 +e yiwx;) i=1 a)...

Recall that in lectures we showed that the Logistic Regression for binary classification boils down to solving the following optimization problem (training error) over n training samples: n T f(w) =...

1 Solve the problem. Find the demand function q = D(x), given that E(x) = and q = e when x = 8. q = 8 lnx q = e8/x q= q = ln 5 points Question 2 Find the general solution of the differential...

Cutting Edge This case was adapted from Hiller, Frederick S. & Mark S. Hillier (2014). Introduction to Management Science: A Modeling and Case Studies Approach with Spreadsheets, 5th ed.,...

The following information was taken from the financial statements of Bailey Inc. for December 31 of the current fiscal year: Common stock , $10 par value (no change during the year) ...$4,000,000...

Karen has filed a voluntary petition for a Chapter 7 proceeding. The total value of her estate is $35,000. Ben, who is owed $18,000, has a security interest in property valued at $12,000. Lauren has...

Movies, music, and other digital content can be illegally distributed. What laws or strategy are designed to prevent this illegal distribution? A . Access controls B . Intellectual Property rights C...

Which is NOT a dimension of customer value? a . Social Impact b . Price c . Flexibility d . Product Innovation e . Service

List two situation questions (what would you do . . .?) and two behavioral questions (what did you do . . .?) that you might ask to unearth insights into the candidates motivation.

Based on what you read in this chapter, what suggestions would you make for improving this companys global HR practices?

Why do you think appraisal in these Chinese firms seems to be tougher than in the United States?