-
Notifications
You must be signed in to change notification settings - Fork 709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix configs to properly use pytorch-lightning==1.6 with GPU #234
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! One minor comment but rest looks good
fast_dev_run: false | ||
gpus: 1 | ||
gpus: null # Set automatically |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that pytorch lightning will distribute job across all available GPUs? In which case, we might have to look at the performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what's the best approach here. Setting it to 1 or automatically assigning it?
Any thoughts @ashwinvaidya17, @djdameln?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer using only a single gpu by default. I feel users should use distribute only if they know what they are doing. If we distribute the training then we will have to look at the learning rate and experiments might not be reproducible across different number of GPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if no GPUs are available. Will it break or just switch to CPU training?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if no GPUs are available. Will it break or just switch to CPU training?
It switches to CPU.
I would prefer using only a single gpu by default. I feel users should use distribute only if they know what they are doing. If we distribute the training then we will have to look at the learning rate and experiments might not be reproducible across different number of GPUs.
As far as I understand, this doesn't set it to distributed training because strategy
is still null
. It automatically uses a single GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I am fine with how it is now.
Since this PR is not merged yet I'm adding a comment here rather than create an issue. I checked out the branch
And got the following error:
The error is coming from Package list (installed using
Out of the box it is failing for me, do you see the same issue on your side? |
@aj2563 This is a known issue and is addressed in a different open PR. For now you can use |
Description
Fix configs to properly use
pytorch-lightning==1.6.*
and train with GPUFixes GPU available but not used #232
Changes
Checklist