Question: Problem 2.(BONUS](20 points) In this problem we aim at generalizing the Logistic Re- gression algorithm to multi-class classification problem, the setting where the label space

 Problem 2.(BONUS](20 points) In this problem we aim at generalizing the

Problem 2.(BONUS](20 points) In this problem we aim at generalizing the Logistic Re- gression algorithm to multi-class classification problem, the setting where the label space includes three or more classes, i.e., y = {1, 2, ...,K} for K > 3. To do so, consider a training data S = {(x1, y), (x2, y2),..., (Xn, Yn)} where the feature vectors are di Rd and Yi e , i = 1, 2, ..., n. Here we assume that we have K different classes and each input Xi is a d dimensional vector. We can extend the binary prediction rule to multiclass setting by letting the prediction function to be: f(multiclass) argmax P[y = k|x] ke{1,2,...,K} One way to generalize binary model so that we can map to label set y is to assign a different set of parameters wk, bx to each label k e y. For simplicity we ignore the intercept parameters bk. The posterior probability is given by: expW, 2) P[y=k| X; W1, ... , WK-1] = 1+B***exp(w? x) for k = 1,...,K-1 P[y = K | X; W1, ..., WK-1] 1+ki exp(WT) 1 Note that we need to estimate {W1, ..., WK-1}, where each one is a d-dimensional vector. Therefore, we need to estimate in total (K 1) xd parameters. Note that we do not consider parameter vectors for class K as it can be inferred from rest (similar to binary classification where we only had a single parameter vector). a) Please write down explicitly the log likelihood function and simplify it as much as you can. b) Compute the gradient of likelihood with respect to each wk and simply it c) Derive the stochastic gradient descent (SGD) update for multiclass logistic regression. Problem 2.(BONUS](20 points) In this problem we aim at generalizing the Logistic Re- gression algorithm to multi-class classification problem, the setting where the label space includes three or more classes, i.e., y = {1, 2, ...,K} for K > 3. To do so, consider a training data S = {(x1, y), (x2, y2),..., (Xn, Yn)} where the feature vectors are di Rd and Yi e , i = 1, 2, ..., n. Here we assume that we have K different classes and each input Xi is a d dimensional vector. We can extend the binary prediction rule to multiclass setting by letting the prediction function to be: f(multiclass) argmax P[y = k|x] ke{1,2,...,K} One way to generalize binary model so that we can map to label set y is to assign a different set of parameters wk, bx to each label k e y. For simplicity we ignore the intercept parameters bk. The posterior probability is given by: expW, 2) P[y=k| X; W1, ... , WK-1] = 1+B***exp(w? x) for k = 1,...,K-1 P[y = K | X; W1, ..., WK-1] 1+ki exp(WT) 1 Note that we need to estimate {W1, ..., WK-1}, where each one is a d-dimensional vector. Therefore, we need to estimate in total (K 1) xd parameters. Note that we do not consider parameter vectors for class K as it can be inferred from rest (similar to binary classification where we only had a single parameter vector). a) Please write down explicitly the log likelihood function and simplify it as much as you can. b) Compute the gradient of likelihood with respect to each wk and simply it c) Derive the stochastic gradient descent (SGD) update for multiclass logistic regression

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!