Question: Question Four ( 1 0 Marks ) In a sentiment classification, we use the attention mechanism to compute the encoding for the input sequence I

Question Four (10 Marks)
In a sentiment classification, we use the attention mechanism to compute the encoding for the input sequence "I like the pattern class". Assume the initial embedding transposed will be 1=(0.9,0.1,0.1,0.3), like (0.4,0.5,0.3,0.2), the =(0.1,0.1,0.1,0.1), pattern=(0.5,0.5,0.6.0.1), class=(0.4,0.1,0.3,0.1)
We will use single head attention with
Wq the concatenation of the embedding of I, pattern, pattern, class
Wv the concatenation of the embedding of I, the, pattern, class
Wk the identity matrix.
1. Find the embedding for the token "like" using self-attention. Show steps
2. Use hardcoded sinusoid functions to compute the positional embedding for the word "like". Then add the positional embedding to the embedding you obtained in part 1.
3. Suppose we have another token "hate" with initial embedding =(0.2,0.3,0.5,0.4). Given the same context of the token "like" how efficiently can you repeat part 1.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!