Question: Assume max seq length = 1 0 , there are multiple samples, such as [ 1 , 1 , 1 , 1 ] length of

Assume max seq length =10, there are multiple samples, such as
[1,1,1,1] length of 4
[2,2,2,2,2]length of 5
During actual pretraining, the two will be spliced together as one sample for training, for example:
[1,1,1,1,EOS_token,2,2,2,2,2,EOS_token] In this way, for sample 2, it can actually see the token of sample 1. Is there any reason for this?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!