E Considering the following neural network with inputs (x1, x2), outputs (21, 22, 23), and parameters...

Fantastic news! We've Found the answer you've been seeking!

Question:

E Considering the following neural network with inputs (, 2), outputs (21, 22, 23), and parameters 0 = (a, b,

Transcribed Image Text:

E Considering the following neural network with inputs (x1, x2), outputs (21, 22, 23), and parameters = (a, b, c, d, e, f, i, j, k, l, m, n, o, p, q), 91 a b (2) = ( ))+(}). 92 == (h) = 21 22 = 23 C ReLU(91)), ReLU(92) i j k l 6)-(490-8 m n + P h2 A. (15 points) For a minibatch containing a single training sample (x1, x2, y = 2), apply softmax and write down the cross-entropy loss function J(6) as a ӘЈ ӘЈ ӘЈ function of (21, 22, 23). Compute as functions of (21, 22, 23). მმმ3 B. (20 points) Base on A., apply backprop to compute ӘЈ ӘЈ ӘЈ ӘЈ ӘЈ ӘЈ On do' Op' Oq Əh₁' Əh₂ , ᎧᎫ ᎧᎫ ᎧᎫ ᎧᎫ ᎧᎫ ai' aj ak' al' m' C. (20 points) Base on B., apply backprop to compute дЈ af ӘЈ ӘЈ ӘЈ ӘЈ ӘЈ да дь деда де ӘЈ дл Explain why you don't need to compute and Əx1 дх (Hint: use the step function u(x) as the derivative of ReLU(x).) D. (15 points) For the learning rate e, show the equation to apply the simple SGD algorithm to update 0 for this minibatch. E Considering the following neural network with inputs (x1, x2), outputs (21, 22, 23), and parameters = (a, b, c, d, e, f, i, j, k, l, m, n, o, p, q), 91 a b (2) = ( ))+(}). 92 == (h) = 21 22 = 23 C ReLU(91)), ReLU(92) i j k l 6)-(490-8 m n + P h2 A. (15 points) For a minibatch containing a single training sample (x1, x2, y = 2), apply softmax and write down the cross-entropy loss function J(6) as a ӘЈ ӘЈ ӘЈ function of (21, 22, 23). Compute as functions of (21, 22, 23). მმმ3 B. (20 points) Base on A., apply backprop to compute ӘЈ ӘЈ ӘЈ ӘЈ ӘЈ ӘЈ On do' Op' Oq Əh₁' Əh₂ , ᎧᎫ ᎧᎫ ᎧᎫ ᎧᎫ ᎧᎫ ai' aj ak' al' m' C. (20 points) Base on B., apply backprop to compute дЈ af ӘЈ ӘЈ ӘЈ ӘЈ ӘЈ да дь деда де ӘЈ дл Explain why you don't need to compute and Əx1 дх (Hint: use the step function u(x) as the derivative of ReLU(x).) D. (15 points) For the learning rate e, show the equation to apply the simple SGD algorithm to update 0 for this minibatch.

Related Book For answer-question

answer-question

Logic And Computer Design Fundamentals

Logic And Computer Design Fundamentals

ISBN: 9780133760637

5th Edition

Authors: M. Morris Mano, Charles Kime, Tom Martin

See More Books

Posted Date: Jan 17, 2024 02:45 AM

See More Questions