Question: The LSTM ( without bras units and forget gate ) is delined as z ( t ) = t a n h ( U x
The LSTM without bras units and forget gate is delined as
Verbleibende Zeit ::
with input vectors hidden activation vectors memory cell state vectors gate activation vectors weight matrices Let hat denote the loss at time and let denote the total loss. We use denominatorlayout convention, ie is a column vector. The diag operator turns a vector into a diagonal matrix, ie diag and denotes Hadamard's product. Which of the following statements are true?
a If we choose then and the memory cells will always increase at every time step. This can be a problem for very long sequences but can also be helpful for certain problems.
b The gradient of the loss with respect to the hiddens is
c Because of the simple structure of the memory cell, the LSTM architecture fails to be Turing complete.
d The LSTM architecture has no exploding or vanishing gradients because is an orthogonal matrix.
e The memory cell fulfills I neglecting dependencies via the hiddens for any dotst where I is the identity matrix. This solves the vanishing gradient problem and is called constant error carousel.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
