Question: Q 1 ( 6 0 pts ) : Consider a binary classification problem where we use logistic regression to predict the probability that a given

Q1(60 pts): Consider a binary classification problem where we use logistic regression to
predict the probability that a given input \xi nR^(n) belongs to class 1. The model is defined
as:
hat(y)=\sigma (w^(TT)x+b)
where \sigma (z)=(1)/(1+e^(-z)) is the sigmoid function, winR^(n) is the weight vector, and b is the
bias term.
The loss function used to train the model is the cross-entropy loss, defined for a single
training example (x,y) as:
L(w,b)=-[ylog(hat(y))+(1-y)log(1-hat(y))]
Given a training dataset {(x^((i)),y^((i)))}_(i)=1^(m), the overall loss is the average cross-entropy loss:
J(w,b)=(1)/(m)\sum_(i=1)^m L(w,b;x^((i)),y^((i)))
Questions:
(a) Derive the gradients of the loss function J(w,b) with respect to the parameters w and
b. Show all steps in your derivation. [15 points]
(b) Write the update equations for w and b using gradient descent. Assume the learning
rate is \alpha .[10 points]
(c) Suppose you have a dataset with three training examples and the current values of the
parameters are w=[0.5,-0.3] and b=0.1. The learning rate \alpha is set to 0.01. Given
the following dataset:
Calculate the gradients and update the parameters w and b after one step of gradient
descent. [25 points]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!