Question: Compared to a transformer, the feed-forward sequential memory network (FSMN) [262] is a more efficient model to convert a context-independent sequence into a context-dependent one.

Compared to a transformer, the feed-forward sequential memory network (FSMN) [262] is a more efficient model to convert a context-independent sequence into a context-dependent one. An FSMN uses the tapped delay line shown in Figure 8.17 to convert a sequence

y1, y2, , yT

(yi 2 Rn) into

ˆz1, ˆz2, , ˆzT

(ˆzi 2 Ro) through a set of bidirectional parameters

ai i = ????L + 1, , L ???? 1, L

a. If each ai is a vector (i.e., ai 2 Rn), estimate the computational complexity of an FSMN layer. (Note that o = n in this case.)

b. If each ai is a matrix (i.e., ai 2 Ron), estimate the computational complexity of an FSMN layer.

c. Assume n = 512, o = 64, T = 128, J = 8, L = 16; compare the total number of operations in the forward pass of one layer of such a matrix-parameterized FSMN with that of one multihead transformer in the box on page 174. How about using a vector-parameterized FSMN (assume o = 512 in this case)?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Pattern Recognition And Machine Learning Questions!

9 . 5 Two separate transformer banks are connected as shown in Figure P 9 . 5 , without high - side breakers for economy. High - side transformer CTs are not available. The banks are connected per...

Promoting Your Content Project Description: Now that you have written a great article post that has been optimized for search, you still have to actively market your blog post across a variety of...

A heat sink is to be used for the thermal management of a transformer, where the heat sink enhances convective heat transfer to the surrounding ambient air. The entire system or setup can be assumed...

EXPERIMENTAL SECTION L- G Fault Analysis 1.1 EXPERLMENTAL OBJECTIVE To analyze and calculate different fault currents that occur due to the introduction of faults (L-G. ) in transmission line using...

A 230kV:69kV, 100 MVA three phase transformer (wye grounded-wye grounded) has per unit leakage reactance of 10% with an X/R ratio of 15. This is supplied by a source impedance of j0.05pu. No change...

i need this question to be solved asap... Hint:we have to apply voltage using the ramp function to steadily increase it and then use damping function to exponentially decrease it and in the start we...

1. Background: a.What is an industrial power transformer? b. What is the expected lifetime of this type of equipment? c.What are PCBs? What are the dangers of these chemicals? d.Are PCB transformers...

Hey I need help with all these transformer question. Please explain with details, thank you Transformers WS 1. A transformer has 100 turns on its primary side and 600 turns on its secondary side. It...

Please answer Question 1: Computer the generalized constants [A], [B], [c], [d] for all series components 1) Compute the generalized constants [A], [B], [c], [d] for all series components, a, b, y,...

This is the Assignment my professor gave me for C++ I need help filling out a Test Plan for it, all the available information is below. 1 Overview This programming assignment requires creating and...

Hot combustion air at 1500 K expands in a polytropic process to a volume 6 times as large with n = 1.5. Find the specific boundary work and the specific heat transfer.

Given the following functions F(s) find inverse Laplace functions. F(s) = 10 / (s2 + 2s + 2) F(s) = 10(s + 2) / (s2 + 4s + 5)

Where to put common stock on form 1 0 4 0