Question: ## Part 3 . Model Building: Backwards Pass We are ready to complete the function that computes the backward pass of our model! You should

## Part 3. Model Building: Backwards Pass
We are ready to complete the function that computes the backward pass of
our model!
You should start by reviewing the lecture slides on backpropagation.
One difference between the slides and our implementation here is that the
slides express the required computations for computing the gradients of
the loss for a *single data point*.
However, our implementation of backpropagation is further vectorized to
compute gradients of the loss for a *batch consisting of multiple data points*.
We begin with applying the backpropagation algorithm on our forward pass
steps from earlier. Recall that our model's forward pass is as follows:
\begin{align*}
\bf{x_a} &=\textrm{the one-hot vector for word 1}\\
\bf{x_b} &=\textrm{the one-hot vector for word 2}\\
\bf{x_c} &=\textrm{the one-hot vector for word 3}\\
\bf{v_a} &=\bf{W}^{(word)}\bf{x_a}\\
\bf{v_b} &=\bf{W}^{(word)}\bf{x_b}\\
\bf{v_c} &=\bf{W}^{(word)}\bf{x_c}\\
\bf{v} &=\textrm{concatenate}(\bf{v_a},\bf{v_b},\bf{v_c})\\
\bf{m} &=\bf{W^{(1)}}\bf{v}+\bf{b^{(1)}}\\
\bf{h} &=\textrm{ReLU}(\bf{m})\\
\bf{z} &=\bf{W^{(2)}}\bf{h}+\bf{b^{(2)}}\\
\bf{y} &=\textrm{softmax}(\bf{z})\\
L &=\mathcal{L}_\textrm{Cross-Entropy}(\bf{y},\bf{t})\\
\end{align*}
Following the steps discussed in this week's lecture, we should get
the following backward-pass computation (verify this yourself!):
\begin{align*}
\overline{{\bf z}} &={\bf y}-{\bf t}\\
\overline{W^{(2)}} &=\overline{{\bf z}}{\bf h}^T \\
\overline{{\bf b^{(2)}}} &=\overline{{\bf z}}\\
\overline{{\bf h}} &={W^{(2)}}^T\overline{z}\\
\overline{W^{(1)}} &=\overline{{\bf m}}{\bf v}^T \\
\overline{{\bf b}^{(1)}} &=\overline{{\bf m}}\\
\overline{{\bf m}} &=\overline{{\bf h}}\circ \textrm{ReLU}'({\bf m})\\
\overline{{\bf v}} &={W^{(1)}}^T \overline{{\bf m}}\\
\overline{{\bf v_a}} &=\dots \\
\overline{{\bf v_b}} &=\dots \\
\overline{{\bf v_c}} &=\dots \\
\overline{{\bf W^{(word)}}} &=\dots \\
\end{align*}
**Task**: What is the error signal $\overline{{\bf v_a}}$?
How does this quantity relate to $\overline{{\bf v}}$?
To answer this question, reason about the scalars that make up the elements of
$\overline{{\bf v}}$. Which of these scalars also appear in $\overline{{\bf v_a}}$?
Express your answer by computing `va_bar`(representing the quantity $\overline{{\bf v_a}}$)
given `v_bar`(representing the quantity $\overline{{\bf v}}$).

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!