Question: For this problem, I need help with deriving the equation below with z = tanh(w^Tx). I've attached class notes for deriving the backpropagation algorithm where

For this problem, I need help with deriving the equation below with z = tanh(w^Tx). I've attached class notes for deriving the backpropagation algorithm where z = sigma_y. However, I need to derive with z = tanh.

PLEASE DO NOT COMMENT OR ANSWER IF YOU ARE UNABLE TO HELP OR PROVIDE USEFUL CLARIFICATION WITH INTENT TO ANSWER AFTER MY EDITING!!!

For this problem, I need help with deriving the equation below with

CLASS NOTES:

z = tanh(w^Tx). I've attached class notes for deriving the backpropagation algorithm

where z = sigma_y. However, I need to derive with z =

Using the sum of squared errors function E(w) = 2aEx{tka Zka), revise the Backpropagation algorithm for a two-class classification application so that it operates on units using the squashing" function tanh in place of the sigmoid function. That is, assume the output of a single unit is z = tanh(w+x). Give the weight update rule for output layer weights and hidden layer weights. Hint: tanh'(x) = 1 tanh2(x) Backpropagation . Let's start by redefining the error E to sum the errors over all of the network outputs: E(w) = [ { (txa Exa)? LED keoutputs where outputs is the set of output units in the network, tkd is the target value for the kth output unit and training example d, and Zed is the actual output of the kth output unit for training example d. Similarly, we can define the error Ed for a single training example as: Ea(w) == E (tx 2y)? keoutputs 11 Backpropagation (2) . Stochastic gradient descent involves iterating through the training examples one at a time . For each training example d, we descend the gradient of the error Ed with respect to this single example . Thus, for each training example d, every weight w;; is updated by adding Aw; to it Ea Awji = -aw We can derive an expression for Ed in order to implement the rule shown above aw Derivation of the backpropagation algorithm . Note that the only effect that a given weight w, has on the network is through its contribution to the weighted sum -. . Therefore, we can use the chain rule: DE BE, dy ow, oy, owy DE dy . So now we need an expression for This will require us to examine two separate cases: (1) where is an output layer unit, and (2) where is a hidden unit Case 1: j is an output layer unit y can only effect the rest of the network via 2, so we can use the chain rule again: DE The first termis 1 LE40-20 ( . This will be zero for all outputs other than the output of unit), so setk - hot (3) - {2(3-2) 019-31) --(-2) de Case 1: j is an output layer unit (2) . Remember that 2-cy). Therefore, the second term is: oly)=3(1-3) Replacing the first and second terms yields: E --(1-2), (1-2) - from now on, we will represent the ecz(1-2)(1)-2) Thus, the training rule for the output units' weights becomes: BE Awi - -- 13(1-2)0 - 2) dw Case 2: j is a hidden layer unit . For hidden units, the weights can only indirectly influence the outputs of the output layer Let's define the set of all units whose direct inputs include the output of unit as the immediately downstream units of or (b) . Note that y can only influence the network outputs (and therefore Ea) through the units in D(); therefore: dyy (86) (3) A -8, A0 20) Case 2: j is a hidden layer unit (2) -8 RU dz - -&us DO 2 -82,3(1-2) Rearranging terms and using 8, in place of gives: 8; -7/(1-2) W DU AWWA DE -15% Using the sum of squared errors function E(w) = 2aEx{tka Zka), revise the Backpropagation algorithm for a two-class classification application so that it operates on units using the squashing" function tanh in place of the sigmoid function. That is, assume the output of a single unit is z = tanh(w+x). Give the weight update rule for output layer weights and hidden layer weights. Hint: tanh'(x) = 1 tanh2(x) Backpropagation . Let's start by redefining the error E to sum the errors over all of the network outputs: E(w) = [ { (txa Exa)? LED keoutputs where outputs is the set of output units in the network, tkd is the target value for the kth output unit and training example d, and Zed is the actual output of the kth output unit for training example d. Similarly, we can define the error Ed for a single training example as: Ea(w) == E (tx 2y)? keoutputs 11 Backpropagation (2) . Stochastic gradient descent involves iterating through the training examples one at a time . For each training example d, we descend the gradient of the error Ed with respect to this single example . Thus, for each training example d, every weight w;; is updated by adding Aw; to it Ea Awji = -aw We can derive an expression for Ed in order to implement the rule shown above aw Derivation of the backpropagation algorithm . Note that the only effect that a given weight w, has on the network is through its contribution to the weighted sum -. . Therefore, we can use the chain rule: DE BE, dy ow, oy, owy DE dy . So now we need an expression for This will require us to examine two separate cases: (1) where is an output layer unit, and (2) where is a hidden unit Case 1: j is an output layer unit y can only effect the rest of the network via 2, so we can use the chain rule again: DE The first termis 1 LE40-20 ( . This will be zero for all outputs other than the output of unit), so setk - hot (3) - {2(3-2) 019-31) --(-2) de Case 1: j is an output layer unit (2) . Remember that 2-cy). Therefore, the second term is: oly)=3(1-3) Replacing the first and second terms yields: E --(1-2), (1-2) - from now on, we will represent the ecz(1-2)(1)-2) Thus, the training rule for the output units' weights becomes: BE Awi - -- 13(1-2)0 - 2) dw Case 2: j is a hidden layer unit . For hidden units, the weights can only indirectly influence the outputs of the output layer Let's define the set of all units whose direct inputs include the output of unit as the immediately downstream units of or (b) . Note that y can only influence the network outputs (and therefore Ea) through the units in D(); therefore: dyy (86) (3) A -8, A0 20) Case 2: j is a hidden layer unit (2) -8 RU dz - -&us DO 2 -82,3(1-2) Rearranging terms and using 8, in place of gives: 8; -7/(1-2) W DU AWWA DE -15%

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!

MAIN MENU Previous Problem Problem List Next Problem Courses Homework Sets Assignment10 Assignment10: Problem 8 Problem 8 (1 point) User Settings Grades Evaluate the summation using the properties of...

hi expert help me answer section 1 and section 2 thank you Abstract Purpose: Little research has been contributed to how the behaviors associated with emotional intelligence may be practically...

Sheet 2 (Unit 2) is where I have started my work. Ch. 4 Problem 2a-c Calculating Future Values : Compute the future value of $3,200 compounded annually for a. 10 years at 6 percent b. 10 years at 8...

Hello. I'm looking for some help completing my finance project. (10% total course grade) There are quite a few questions. (40 total) I can check the answers one time before submitting. Hoping for...

6-7; 6-9; 6-12 . I need help with these finance questions. PROBLEM 6-1 Cash provided by operations (CPBO) differs from free cash flow for the following reasons: 1. Solution Legend = Value given in...

MAIN MENU Previous Problem Problem List Next Problem Courses Homework Sets AS'SignmentG: Problem 2 User Settings (1 POInt) Grades 9:1:2 we HI) = C] Problem 1 l Problem 2 and Problem 3 Problem 4 f...

CHAPTER 3 PROBLEM MANAGEMENT 77 FIGURE 3.2 The CAPRA Problem-Management System CLIENTS Who are the clients (direct and indirect)? ACQUIRING AND ANALYZING INFORMATION IF What is the apparent problem?...

WW 2.1 X WeBWork : Math 1241Common : x C If A Ball Is Thrown Straight Up Int x Course Hero X Mathway | Calculus Problem Solv X + V X C webwork.uncc.edu/webwork2/Math1241Common/Sec2.1/3/ To V M : Apps...

A 3.33-g sample of iron ore is transformed to a solution of iron(II) sulfate, FeSO4, and this solution is titrated with 0.150 M K2Cr2O7 (potassium dichromate). If it requires 43.7 mL of potassium...

In paragraphs 19 and 20, NietzSche discusses the error of imaginary causes. Are there instances in your own life when you made the mistake of assigning imaginary causes to effects you observed? If...

The average rate of return is the average cash flow after tax _ _ _ _ _ _ _ _ the initial investment: The ARR is the most popular evaluation criterion used today.

WHAT IS the cash flow statement BBINFIT ,AND HOW WE CAN ANALAYSES

Did the sender choose an appropriate medium and channel for the message? Media and Purpose: List three messages you have read, viewed, or listened to lately (such as direct-mail promotions, letters,...

A letter from a recent college graduate, requesting a letter of recommendation from a former instructor Message Organization: Choosing the Approach: Indicate whether the direct or the indirect...

An email message to a car dealer, asking about the availability of a specific make and model of car Message Organization: Choosing the Approach: Indicate whether the direct or the indirect approach...