Question: Consider the problem of binary classification where x = (x1, ..., xd)T Rd, y {0, 1} and w = (w1, ...,wd)T Rd are the input

Consider the problem of binary classification where x = (x1, ..., xd)T Rd, y {0, 1} and w = (w1, ...,wd)T Rd are the input feature vector, the outcome target and the weight vector, respectively. The binary regression model is parameterized as follows: y Bernoulli(y) with y = p(y = 1|x,w) = (z) = 1 1+ez where z = wTx = Xd j=1 wjxj . (a) For a single example (x(i), y(i)), the loss is defined as the negative log likelihood or cross-entropy loss: L(i) CE(w) = log p(y(i)|x(i),w). Recall that, p(y(i)|x(i),w) = (z(i))y(i)(1 (z(i)))1y(i) , y(i) = 0, 1. Show that L(i) CE(w) = y(i) log (z(i)) (1 y(i)) log 1 (z(i)) (b) Using the chain rule show that wj L(i) CE(w) = ((z(i)) y(i))x(i) j , j = 1...d. (c) Let D = {(x(i), y(i))}mi =1 be a training set of m identically independently distributed examples. Let the cross-entropy loss corresponding to D be LDC E(w) = Xm i=1 L(i) CE(w). Derive the expression for wj L(D) CE(w), j = 1...d, and hence provide both the stochastic and batch gradient descent update rules. (d) Provide an interpretation of the results found in (c).

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Chemical Engineering Questions!

CSC 411 / CSC 2515 Introduction to Machine Learning ASSIGNMENT # 1 Due at NOON on: Oct. 19 (CSC 411) / Oct. 20 (CSC 2515) 1 Logistic Regression (40 points) 1.1 (10 points) Bayes' Rule Suppose you...

answer problem 3 1. Consider the following one hidden layer Neural Network with 2k hidden units. The network parameters are W E R2kxd and v e R2%, which we denote jointly by W=(W.v). The network...

Problem 2.(BONUS](20 points) In this problem we aim at generalizing the Logistic Re- gression algorithm to multi-class classification problem, the setting where the label space includes three or more...

Please complete Section 3 in MATLAB Please do not copy the answer from the previous chegg posts Some useful information for this task can be found below: (write the code assuming that you have all...

MATHEMATICS FOR MACHINE LEARNING Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong Contents Foreword 1 Part I Mathematical Foundations 9 1 Introduction and Motivation 11 1.1 Finding Words for...

2, due Tuesday March 1 COMS 4721 Spring 2016 ? ? ? Do two of { Problem 1, Problem 2, Problem 3 }, and also Problem 4. ? ? ? Problem 1 (Margins; 10 points). The Perceptron convergence theorem is often...

Q1) Discriminant Linear Classifiers: You are given a training data set {xn, tn} of size N = 21. Each input vector xn is a point in the 2-dimensional Euclidean space R2 . We have x1 = (0, 0), x2 = (1,...

these are the algorithms. The algorithms are for this question. Algorithm 6.3 Forward propagation through a typical deep neural network and the computation of the cost function. The loss L,y) depends...

Note: All ML code must be explained clearly (INJAVAXX)and should be free of needless complexity. 2 CST.2016.1.3 2 Foundations of Computer Science Please help. (2c) (a) A prime number sieve is an...

Consider a binary classification problem of finding the binary labels y in { 1 , 1 } , for input examples of the form x in R d \ times 1 . We will use the following loss function which is based on...

For the month of June, Madelyn budgets $400 for food, $160 for clothes, and $60 for entertainment. a) Write a 1 3 matrix B that represents the amounts budgeted for these items. b) After receiving a...

How are FICA taxes calculated? Is it possible to overpay them? If your answer is yes, what should you do?

Which of the following represents the total rate of return earned on a bond when sold on the secondary market? Interest rate - depreciation rate Capital gain current yield Interest rate coupon rate...

Which of the following waiting lines do you prefer and why? Single - Queue, Multiple - Channel System Structure ( Fig 5 . 8 in text ) Multiple - Queue, Multiple - Channel System Structure ( Fig 5 . 9...

I W/7y were you so motivated and able to work so productively?

2. Why did Nancys approach to the MBC program generate such a negative reaction? Review the four steps or phases that are included in a good performance management system? Which of these steps did...

3. What should Nancy have done differently to achieve unity and alignment?