-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clear divide of loss before and after commit 9bec3c98a22c91b1c28fda757db51eb780291641 #2983
Comments
It should not cause a loss regression, could you try fine-tuning models with different seeds? |
I started another run, but it will take some time. Will update this comment once a few hundred steps have ran. |
Here are another two runs with a new seed
Black = With commit |
Here is the logs difference if you need it: |
Could you try again with 3bcd41b ? |
Thank you very much for your rapid test, we are still working to check the reason leading to the loss regression. We will post a bug report once we find it out. |
Reminder
Reproduction
Command:
Deepspeed config:
The ones with the higher loss are the commits after (and including)
9bec3c98a22c91b1c28fda757db51eb780291641
. The same can be seen with the train loss as well:All the tests were performed with the same dataset and seed and arguments.
Expected behavior
No regression in the train/eval loss.
System Info
transformers
version: 4.39.1Others
Please tell me if you need more info, and I will help if possible!
The text was updated successfully, but these errors were encountered: