Question: Suppose that we have a multihead transformer as shown in Figure 8.27, where A j,B j 2 Rld,C j 2 Rod j = 1
Suppose that we have a multihead transformer as shown in Figure 8.27, where A¹ jº,B¹ jº 2 Rld,C¹ jº 2 Rod ¹ j = 1 Jº.
a. Estimate the computational complexity of the forward pass of this transformer for the input sequence X 2 RdT .
b. Derive the error back-propagation to compute the gradients for A¹ jº,B¹ jº,C¹ jº when an objective function Q¹º is used.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
