Question: Question 4 : In logistic regression, we assume Y = ( Y 1 , dots, Y n ) T are a collection of n binary

Question 4:
In logistic regression, we assume Y=(Y1,dots,Yn)T are a collection of n binary observa-
tions. For each Yi, we observe xi=(xi1,xi2,xi3)T predictors. We assume
logpi1-pi=xiT,
where =(1,dots,3)T is the vector of regression coefficients.
a) Formulate the overall loglikelihood of the dataset l(Y).
b) Derive the first derivative dellYdel2.
c)Derive the second derivative del2lYdel2del3.
d) Suppose hat()current=(0.1,0.2,0.3)T, and we have a new observation xn+1=(3,2,4)T.
Predict the probability p of success for this new observation.
e) Suppose Yn+1=1, calculate the first derivative vector dellYn+1del.
f) Suppose Yn+1=1, calculate the second order derivative matrix del2lYn+1deldelT.
g) If instead of using Newton's method, we decide to use stochastic gradient accent
algorithm, which updates the parameter in the following manner:
hat()new=hat()current+dellYn+1del
Please update the estimate for . Use the learning rate =0.001.
(You may notice that we are maximizing the objective function (loglikelihood), we are
using stochastic gradient accent algorithm, which is very similar to the gradient descent
algorithm for minimization purpose, which we learned a few weeks ago. "Accent" means
"going up" and "descent" means "going down". The two methods only differ by the sign of
the second term, for maximization, we add the second term, for minimization we minus the
second term.
 Question 4: In logistic regression, we assume Y=(Y1,dots,Yn)T are a collection

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!