Question: Problem 2.(BONUS](20 points) In this problem we aim at generalizing the Logistic Re- gression algorithm to multi-class classification problem, the setting where the label space

Problem 2.(BONUS](20 points) In this problem we aim at generalizing the

Problem 2.(BONUS](20 points) In this problem we aim at generalizing the Logistic Re- gression algorithm to multi-class classification problem, the setting where the label space includes three or more classes, i.e., y = {1, 2, ...,K} for K > 3. To do so, consider a training data S = {(x1, y), (x2, y2),..., (Xn, Yn)} where the feature vectors are di Rd and Yi e , i = 1, 2, ..., n. Here we assume that we have K different classes and each input Xi is a d dimensional vector. We can extend the binary prediction rule to multiclass setting by letting the prediction function to be: f(multiclass) argmax P[y = k|x] ke{1,2,...,K} One way to generalize binary model so that we can map to label set y is to assign a different set of parameters wk, bx to each label k e y. For simplicity we ignore the intercept parameters bk. The posterior probability is given by: expW, 2) P[y=k| X; W1, ... , WK-1] = 1+B***exp(w? x) for k = 1,...,K-1 P[y = K | X; W1, ..., WK-1] 1+ki exp(WT) 1 Note that we need to estimate {W1, ..., WK-1}, where each one is a d-dimensional vector. Therefore, we need to estimate in total (K 1) xd parameters. Note that we do not consider parameter vectors for class K as it can be inferred from rest (similar to binary classification where we only had a single parameter vector). a) Please write down explicitly the log likelihood function and simplify it as much as you can. b) Compute the gradient of likelihood with respect to each wk and simply it c) Derive the stochastic gradient descent (SGD) update for multiclass logistic regression. Problem 2.(BONUS](20 points) In this problem we aim at generalizing the Logistic Re- gression algorithm to multi-class classification problem, the setting where the label space includes three or more classes, i.e., y = {1, 2, ...,K} for K > 3. To do so, consider a training data S = {(x1, y), (x2, y2),..., (Xn, Yn)} where the feature vectors are di Rd and Yi e , i = 1, 2, ..., n. Here we assume that we have K different classes and each input Xi is a d dimensional vector. We can extend the binary prediction rule to multiclass setting by letting the prediction function to be: f(multiclass) argmax P[y = k|x] ke{1,2,...,K} One way to generalize binary model so that we can map to label set y is to assign a different set of parameters wk, bx to each label k e y. For simplicity we ignore the intercept parameters bk. The posterior probability is given by: expW, 2) P[y=k| X; W1, ... , WK-1] = 1+B***exp(w? x) for k = 1,...,K-1 P[y = K | X; W1, ..., WK-1] 1+ki exp(WT) 1 Note that we need to estimate {W1, ..., WK-1}, where each one is a d-dimensional vector. Therefore, we need to estimate in total (K 1) xd parameters. Note that we do not consider parameter vectors for class K as it can be inferred from rest (similar to binary classification where we only had a single parameter vector). a) Please write down explicitly the log likelihood function and simplify it as much as you can. b) Compute the gradient of likelihood with respect to each wk and simply it c) Derive the stochastic gradient descent (SGD) update for multiclass logistic regression

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

had a single parameter vector). To make a prediction, we can extend the binary prediction rule to multiclass setting by letting the prediction function to be: f(multiclass ) (ac) = argmax Ply = k|ac]...

INSTRUCTIONS ---> Python There are three parts to this project in Python. Please read all sections of the instructions carefully. I. Perceptron Learning Algorithm II. Linear Regression III....

tudy of an innovative method based on complementarity between ARIZ, lean management and discrete event simulation for solving warehousing problems Fatima Zahra Ben Moussa a, , Roland De Guiob ,...

Please read the question Question : What are "spaced practice", "varied practice", and "interleaved practice"? Give a definition for each. Then give an example of each from your own experience as a...

MATHEMATICS FOR MACHINE LEARNING Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong Contents Foreword 1 Part I Mathematical Foundations 9 1 Introduction and Motivation 11 1.1 Finding Words for...

INSTRUCTIONS There are three parts to this project in Python. Please read all sections of the instructions carefully. I. Perceptron Learning Algorithm II. Linear Regression III. Classification You...

National Business Institute of Australia 20 Clark Rd. Ivanhoe Victoria 3079 www.nbi.com.au 03 9499 7872 FNSORG601A Negotiate to achieve goals and manage disputes Student Workbook melbourne . sydney ....

Abstract This article describes CRISP-DM (Cross-Industry Sandand Process for Data Mining), a non-proprietary, documented, and freely available data mining model. Dezeloped by indias- try leaders...

Reed these three chapters. Chapter 1, 8, and 9. Wryte a 2-3 paragrafff summary on each chapter. Be sure to label each chapter summary. play.google.com @ 5 + (107) Relaxing Music For Stress Relief,...

QUIZ... Let D be a poset and let f : D D be a monotone function. (i) Give the definition of the least pre-fixed point, fix (f), of f. Show that fix (f) is a fixed point of f. [5 marks] (ii) Show that...

To demonstrate that a planned commercial will be cost effective, at least 60% of those watching the programming need to see the commercial (rather than switching stations or using a digital recorder...

Maze Inc. has held some unique models of watches in stock for almost a year since they purchased those watches on 1 January 2017 at a cost of 300,000. At the end of the year 2017, Maze Inc. was...

10. The image of the origin under a reflection in the line y = x + 2 is point A. On a sheet of graph paper, (i) draw the line y = x + 2, (ii) find the coordinates of A.

Compared with half a century ago, adoption has become _ _ _ _ _ _ _ _ _ common, but it is more open and acceptabl e , so we probably discuss it _ _ _ _ _ _ _ . fill in the blanks more or much less or...

1. Do you believe that in every conflict situation, mutually acceptable solutions exist or are available? __ always __ usually __ occasionally __ seldom __ never true

7. Do you believe that others are worthy of your trust? __ always __ usually __ occasionally __ seldom __ never true

1. Have two observers witness the team in action as members debate important agenda items or strategies. Write detailed notes on who said what to whom, what was the reaction, and so forth. Once you...