-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support torch dynamo for deepspeed>=0.14.4 #3069
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Would it be possible to add a small test in tests/deepspeed/test_deepspeed.py
? Single GPU should be good enough. Thanks!
cc @SunMarc to keep an eye on since it's compile related
This is a untested test, I'll run it tomorrow and see if it's really working. |
225c2a5
to
1ce1ede
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @oraluben ! LGTM ! Just nit, you will need to update the deepspeed_launcher to account for dynamo args, just like @pacman100 suggested before #2460 (comment). This will probably be needed in order to pass the tests you added.
Also, I tried launching the script that @pacman100 shared in the previous PR with the following setup:
- inductor + TORCHDYNAMO_DEBUG_FUNCTION=forward : same speed as without dynamo
- inductor + without TORCHDYNAMO_DEBUG_FUNCTION=forward : slow at the beginning then then ~ same iteration speed as the first one.
- without dynamo
Did you do any benchmark on your side ? That would be nice to have an example which shows speed increase.
Reproduction shared here Traceback:
|
We've seen improvement in compiled llama. My guess is the demo in |
You mean the |
Sure, I'd love to share, maybe later this week or next week, I'm a little busy now 🫠 |
1ce1ede
to
b126ddc
Compare
https://gist.github.com/oraluben/9b8240c2fe482eb4382453d6c97a5f76 TLDR: a ~10% speedup in llama, but some deepspeed patch is required update: I just realized that this do not use accelerate directly, it's a transformers-based demo, is that good for you? |
That good enough @oraluben ! Thanks for the nice reproducer =) |
I would prefer to not checkin the current version of test, for it does not represent the best practice to combine accelerate with deepspeed and torch dynamo (and not even working right now). Based on my demo, do you have some idea for a better test? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
@SunMarc if you're comfortable with this here I'm comfortable and we can go test-less for now |
Yeah for sure let's merge this as this shouldn't affect users in general. I'll test this later ! |
What does this PR do?
Fixes # (issue)
Related: microsoft/DeepSpeed#6502
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.