Question: Compared to a transformer, the feed-forward sequential memory network (FSMN) [262] is a more efficient model to convert a context-independent sequence into a context-dependent one.

Compared to a transformer, the feed-forward sequential memory network (FSMN) [262] is a more efficient model to convert a context-independent sequence into a context-dependent one. An FSMN uses the tapped delay line shown in Figure 8.17 to convert a sequence



y1, y2,    , yT


(yi 2 Rn) into



ˆz1, ˆz2,    , ˆzT


(ˆzi 2 Ro) through a set of bidirectional parameters



ai i = ????L + 1,    , L ???? 1, L


.

a. If each ai is a vector (i.e., ai 2 Rn), estimate the computational complexity of an FSMN layer. (Note that o = n in this case.)

b. If each ai is a matrix (i.e., ai 2 Ron), estimate the computational complexity of an FSMN layer.

c. Assume n = 512, o = 64, T = 128, J = 8, L = 16; compare the total number of operations in the forward pass of one layer of such a matrix-parameterized FSMN with that of one multihead transformer in the box on page 174. How about using a vector-parameterized FSMN (assume o = 512 in this case)?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Pattern Recognition And Machine Learning Questions!

Q:

a