Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates for PyTorch Lightning 2.0 release #266

Merged
merged 8 commits into from
Mar 29, 2023
Merged

Updates for PyTorch Lightning 2.0 release #266

merged 8 commits into from
Mar 29, 2023

Conversation

ejm714
Copy link
Collaborator

@ejm714 ejm714 commented Mar 27, 2023

Configure accelerator and devices for Trainer

Updates how we set the accelerator and devices based on recent PyTorch Lightning changes.

pl.Trainer now accepts accelerator ("cpu", "gpu", etc."), strategy, and devices params instead of num_gpus. This PR adds a utility to configure accelerator and devices based on the number of user-specified gpus.

Pros:

  • keeps the command line to the same limited set of options (e.g. just specifying number of gpus instead of exposing both the cpu/gpu option as well as number of devices/cores)
  • allows us to validate user-provided input up front to ensure that training won't fail after the long video loading check

Cons

  • doesn't allow the user to specify the number of cores; if a GPU is not used, all cores will be used
  • doesn't support the new "mps" accelerator for apple silicon

Relevant links:

Closes #264
Closes #265

Other changes based on PTL 2.0 release

The new PTL release makes a bunch of non-backward compatible API changes. This PR also provides fixes for those and sets 2.0 as the floor.

Bonus fixes

  • adds a note in the depth estimation docs about the limitations for multiple individuals in the frame (which came up in a user email)
  • only use DDP strategy for multi-gpu; this removes a bunch of unnecessary logging warnings about metric syncing across devices which arose from using DDP on a single GPU; we can take advantage of the new "auto" strategy in PTL 2.0

@netlify
Copy link

netlify bot commented Mar 27, 2023

Deploy Preview for silly-keller-664934 ready!

Name Link
🔨 Latest commit ba747c7
🔍 Latest deploy log https://app.netlify.com/sites/silly-keller-664934/deploys/6422097983d24c00080bcf6e
😎 Deploy Preview https://deploy-preview-266--silly-keller-664934.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 27, 2023

@ejm714 ejm714 changed the title Set accelerator and devices instead of num_gpus Updates for PyTorch Lightning 2.0 release Mar 27, 2023
@ejm714 ejm714 requested a review from pjbull March 27, 2023 21:14
@codecov
Copy link

codecov bot commented Mar 27, 2023

Codecov Report

Merging #266 (ba747c7) into master (a49b0d7) will decrease coverage by 0.1%.
The diff coverage is 87.5%.

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #266     +/-   ##
========================================
- Coverage    87.6%   87.6%   -0.1%     
========================================
  Files          26      26             
  Lines        2155    2178     +23     
========================================
+ Hits         1889    1908     +19     
- Misses        266     270      +4     
Impacted Files Coverage Δ
zamba/models/model_manager.py 84.1% <66.6%> (-0.3%) ⬇️
zamba/models/utils.py 96.1% <71.4%> (-3.9%) ⬇️
zamba/pytorch_lightning/utils.py 97.7% <100.0%> (+0.2%) ⬆️

... and 1 file with indirect coverage changes

@ejm714
Copy link
Collaborator Author

ejm714 commented Mar 27, 2023

@pjbull tests are now passing, this is ready for your review when you have time 🙇

Copy link
Member

@pjbull pjbull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@pjbull pjbull merged commit 566529a into master Mar 29, 2023
@pjbull pjbull deleted the 265-fix-tests branch March 29, 2023 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed build on master branch (tests #563) Failed build on master branch (tests #562)
2 participants