Question: Assume max seq length = 1 0 , there are multiple samples, such as [ 1 , 1 , 1 , 1 ] length of
Assume max seq length there are multiple samples, such as
length of
length of
During actual pretraining, the two will be spliced together as one sample for training, for example:
EOStoken,EOStoken In this way, for sample it can actually see the token of sample Is there any reason for this?
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
