Question: import math import torch import torch.nn as nn import torch.nn . functional as F :class TransformerBlock ( nn . Module ) :def ( self ,

import math
import torch
import torch.nn as nn
import torch.nn.functional as F
:class TransformerBlock (nn.Module)
:def (self, heads, d, k, m, dropout=0.)
()__super(TransformerBlock, self).__init
self.k = k
self.heads = heads
self.wq = nn.Linear(d, heads**, bias=Ealse)
self.wk = nn.Linear(d, heads*k, bias=Ealse)
self.wy = nn.Linear(d, heads**k, bias=Ealse)
self.wc = nn.Linear(heads*k, d, bias=Ealse)
self.dropoutatt = nn. Dropout (dropout)
self.wl = nn.Linear(d, m)
self.dropoutfc = nn.Dropout (dropout)
self.w2= nn. Linear(m, d)
task define the dropout #
task define the layer normalixation $
nn.init.normal_(self.wq.weight, 0,.02)
nn.init.normal_(self.wk.weight, 0,.02)
nn.init.normal_(self.wv.weight, 0,.02)
nn.init.normal_(self.wc.weight, 0,.02)
nn.init.normal_(self.ml.weight, 0,.02)
nn.init.constant_(self.wl.bias, 0.0)
nn.init.normal_(self.w2.weight, 0,.02)
nn.init.constant_(self.w2.bias, 0.0)
:def forward(self,*, mask)
seq_len, batch_sixe, embed_dim = x.shape
task implement scaled dot-product attention $
task implement residual connection #
task implement the dropout $
task implement the layer normalimation #
task implement the posiion-mise feed formard network $
Hint: Kriting efficient code is almost as important $
-as writing correct code in ML
Avoid writing for-loops! Consider using the #
batch matrix multiplication operator torch.bmm
raise Not ImplementedError("Implement a trans former
block")
return out
Complate task

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!