-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Packing without cross-contamination #25452
Comments
Also in #6661 |
Hi @ToddMorrill For existing models, I am afraid that it's unlikely to make changes for this feature. If there are new models that support this natively in the original modeling code, that could be the case when that model is ported into |
It’s a real bummer because it seems like an important feature to have. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Feature request
Is there something within Hugging Face that prevents latter subsequences from attending to earlier subsequences when you use packing? Is there a way to implement attention masking so that subsequences only attend to tokens within the subsequence within a packed example?
As it currently stands:
There appear to be a few challenges to overcome but nevertheless it seems like an important feature to have.
Motivation
I find it unsettling that when packing, we're just simply letting the latter subsequences' tokens attend to the first subsequences tokens. Packed sequences could have nothing to do with one another and I don't want to contaminate examples. At the same time, I don't want to give up the throughput gains of packing sequences.
I suppose I could sort my dataset by length to minimize the wasted computation (i.e. pack approximately equal length examples into batches together) as a decent solution. I'm not sure if this will impact model performance in any way though.
This feature request has been raised several times: huggingface/trl#302
#17726
I think tensorflow implements this and GraphCore talks about it here and in their paper.
Your contribution
This doesn't strike me as a "first contribution" but if someone wants to coach me, I can give it a shot.
The text was updated successfully, but these errors were encountered: