Question: 1. (50 points) Stochastic gradient descent for MLP. Given the following training set D = {z(), y)} for a three-class classification task: 2(1) = (-0.7411,

1. (50 points) Stochastic gradient descent for MLP. Given the following

1. (50 points) Stochastic gradient descent for MLP. Given the following training set D = {z(), y)} for a three-class classification task: 2(1) = (-0.7411, -0.5078,-0.3206)",y(1) = [0,1,0, 2 (2) = [0.0983, -0.0308, -0.3728]+, y(2) = [1,0,0", 2(3) = [0.0414, 0.2323, -0.2365]", y(3) = [0,1,0, 24 = (-0.7342, 0.4264, 2.0237), y(4) = [0,0,1). Given a two-layer multi-layer perceptron with the following initial weights and biases for the first and second layer respectively. 1.6035 -1.5062 0.2761 Wi = 1.2347 -0.4446 -0.2612, 61 = [0.3919, -1.2507,-0.9480] -0.2296 -0.1559 0.4434 0.0125 1.2424 0.3503 W, = -3.0292 -1.0667 -0.0290, b2 = [-1.5651, -0.0845, 1.6039) -0.4570 0.9337 0.1825 The ReLU function follows the first fully-connected layer, and the softmax function follows the second fully-connected. The network is trained with the cross-entropy loss. If we use the stochastic gradient descent to update the parameters W1, W2,61, 62. (a) (15 points) What's the loss function value of the first iteration? (b) (20 points) If you update the parameters for 1 iteration with learning rate 0.01, what's the param- eters after updating for 1 iteration? (C) (15 points) What's the loss function value of the 2nd iteration? 1. (50 points) Stochastic gradient descent for MLP. Given the following training set D = {z(), y)} for a three-class classification task: 2(1) = (-0.7411, -0.5078,-0.3206)",y(1) = [0,1,0, 2 (2) = [0.0983, -0.0308, -0.3728]+, y(2) = [1,0,0", 2(3) = [0.0414, 0.2323, -0.2365]", y(3) = [0,1,0, 24 = (-0.7342, 0.4264, 2.0237), y(4) = [0,0,1). Given a two-layer multi-layer perceptron with the following initial weights and biases for the first and second layer respectively. 1.6035 -1.5062 0.2761 Wi = 1.2347 -0.4446 -0.2612, 61 = [0.3919, -1.2507,-0.9480] -0.2296 -0.1559 0.4434 0.0125 1.2424 0.3503 W, = -3.0292 -1.0667 -0.0290, b2 = [-1.5651, -0.0845, 1.6039) -0.4570 0.9337 0.1825 The ReLU function follows the first fully-connected layer, and the softmax function follows the second fully-connected. The network is trained with the cross-entropy loss. If we use the stochastic gradient descent to update the parameters W1, W2,61, 62. (a) (15 points) What's the loss function value of the first iteration? (b) (20 points) If you update the parameters for 1 iteration with learning rate 0.01, what's the param- eters after updating for 1 iteration? (C) (15 points) What's the loss function value of the 2nd iteration

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Jupiter Notebook We have covered some of the limitations of single layer neural networks in class, but they are still powerful learning systems that provide a good way to begin learning about how to...

Problem 1 (Stochastic gradient descent). Lets go back to finding a function that approximates a whole data set using the least-squares method. In particular, were going to try and find a linear...

Problem 2.(BONUS](20 points) In this problem we aim at generalizing the Logistic Re- gression algorithm to multi-class classification problem, the setting where the label space includes three or more...

Hello, I don't know how to do this question. Can you help me to solve it with code in Python? Thank you so much! Getting Started Take a look at the columns in the dataset credit_card.csv. We have...

2. (3] 1 point possible (graded, results hidden) Learning a new representation for examples (hidden layer activations) is always harder than learning the linear classifier operating on that...

7. 9 points] Suppose we have the training data shown in Table from which we want to learn a linear regression model, parameterized by a weight vector w and a bias parameter b (a) 1 point Write down...

(Machine Learning) In Python3 Don't use preprocessing from sklearn 3.6 Stochastic Gradient Descent When the training data set is very large, evaluating the gradient of the objective function can take...

1 . ( 4 0 points ) The weight decay regularizer is also called L 2 regularizer, since wT w is the square of the 2 - norm of the weight vector w 2 = qPd i = 0 w 2 i . Another common regularizer is...

Please help me solve it in jupyter notebook. Code below is for Question 1. Thanks~ class ConvNet(torch.nn.Module): def __init__(self): super(ConvNet, self).__init__()...

X1 Problem 2) Gradient descent learning: Consider the following set of data points: input desired X2 label 1 1 1 1 0 1 0 1 0 -1 -1 0 -1 0 0 -1 1 0 As the above table shows, the data points are...

2 Let A= T20,B=T23,C=(3) Euid A B, BA,CA, CB If Cunt do the multipleat let atindiute Why? 21 -| 2 fil Deler nunant 8 A 3 4 -4 1

Todd, age 40, is considering the purchase of a $100,000 participating ordinary life insurance policy. The annual premium is $2280. Projected dividends over the first 20 years are $15,624. The cash...

Bata Choice Co introduced a new product, David Vwulani, to its range last year. The machine used to mould each item is a bottleneck in the production process meaning that a maximum of 5 , 0 0 0 units...

I don't know my mistakes in this question , help me please. thanks The Cash account of Safe and Secure Security Systems reported a balance of $2,440 at December 31, 2018. There were outstanding...

=+j Explain the increasing complexities and challenges faced by IHRM.

=+j Explain the current role and increasing professionalization of the IHRM manager.

=+Describe the ways in which the IHRM department will obtain more involvement in the MNE.