Question: a ) Consider a text sequence with 3 words: such that each word is represented by a 2 - d vector as given in matrix

a) Consider a text sequence with 3 words: such that each word is represented by a 2-d vector as given in matrix x. Show
by step calculations for computing the scaled dot-product based self attention scores using the Wq. Wk and Wv matrices given below
Marks]
b) i) Mention and justify the different types of attention mechanisms used in Transformer architecture.
[2 Marks]
ii) Justify the selection of Bahdanau attention used and the disadvantages due to this selection. Is there any alternative to this attention ?
[2 Marks]
a ) Consider a text sequence with 3 words: such

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!