You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this snippet of code, from what I understand, the padding is not added since using "longest" mode on a single sequence is equivalent to adding no paddings as per this doc. Is it right? So the padding for each prompt is added by the data collator instead of here.
I wonder if it would be clearer if you just write padding=False here or add a comment about it.
The text was updated successfully, but these errors were encountered:
I think so.. Actually they use the dynamic padding by the "DataCollatorForSupervisedDataset". My concern is should the padding tokens be at left rather than right? The other repo https://github.com/tloen/alpaca-lora padding to the left, which makes sense for batch training.
Agree with @srhthu. I think left padding makes more sense, but the train.py used right padding instead. I think the code they use to train Alpaca is simply not correct for batch training. See the explanation here.
My previous understanding is that batch inference with decoder models requires us to do left padding. But at the fine-tuning stage, right-side padding is okay as long as we set the attention mask correctly and turn pad tokens to -100 when calculating loss.
Is it the case that we can just simply use left padding for both training and inference in generation tasks?
stanford_alpaca/train.py
Lines 90 to 99 in 761dc5b
In this snippet of code, from what I understand, the padding is not added since using "longest" mode on a single sequence is equivalent to adding no paddings as per this doc. Is it right? So the padding for each prompt is added by the data collator instead of here.
I wonder if it would be clearer if you just write padding=False here or add a comment about it.
The text was updated successfully, but these errors were encountered: