Question: 1. Explain when we can use additive attention as the scoring function. 2. Write a Python class program that implement additive attention. Demonstrate AdditiveAttention class
1. Explain when we can use additive attention as the scoring function.
2. Write a Python class program that implement additive attention. Demonstrate AdditiveAttention class with a toy example, where shapes (batch size, number of steps or sequence length in tokens, feature size) of queries, keys, and values are (2, 1, 20), (2, 10, 2), and (2, 10, 4), respectively. Show attention weights with heatmap.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
