Question: Question 4 : In logistic regression, we assume Y = ( Y 1 , dots, Y n ) T are a collection of n binary
Question :
In logistic regression, we assume dots, are a collection of binary observa
tions. For each we observe predictors. We assume
where dots, is the vector of regression coefficients.
a Formulate the overall loglikelihood of the dataset
b Derive the first derivative dell
cDerive the second derivative del
d Suppose hat and we have a new observation
Predict the probability of success for this new observation.
e Suppose calculate the first derivative vector dell
f Suppose calculate the second order derivative matrix
g If instead of using Newton's method, we decide to use stochastic gradient accent
algorithm, which updates the parameter in the following manner:
hathat
Please update the estimate for Use the learning rate
You may notice that we are maximizing the objective function loglikelihood we are
using stochastic gradient accent algorithm, which is very similar to the gradient descent
algorithm for minimization purpose, which we learned a few weeks ago. "Accent" means
"going up and "descent" means "going down". The two methods only differ by the sign of
the second term, for maximization, we add the second term, for minimization we minus the
second term.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
