Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for logging with aim. #534

Merged
merged 2 commits into from
Mar 9, 2023
Merged

Support for logging with aim. #534

merged 2 commits into from
Mar 9, 2023

Conversation

tesfaldet
Copy link
Contributor

What does this PR do?

Integrating support for logging with Aim. See this example that shows how to use its lightning integration. More information on Aim's PyTorch Lightning adapter here.

Before submitting

  • Did you make sure title is self-explanatory and the description concisely explains the PR?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you test your PR locally with pytest command?
  • Did you run pre-commit hooks with pre-commit run -a command?

Did you have fun?

Yar

@codecov-commenter
Copy link

codecov-commenter commented Feb 22, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change

Comparison is base (ed7dea4) 83.71% compared to head (2420624) 83.71%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #534   +/-   ##
=======================================
  Coverage   83.71%   83.71%           
=======================================
  Files           9        9           
  Lines         350      350           
=======================================
  Hits          293      293           
  Misses         57       57           
Impacted Files Coverage Δ
src/utils/utils.py 59.80% <100.00%> (-0.78%) ⬇️
src/models/mnist_module.py 98.46% <0.00%> (+0.04%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@ashleve ashleve added the enhancement New feature or request label Feb 24, 2023
@ashleve
Copy link
Owner

ashleve commented Feb 24, 2023

@tesfaldet Thank you for the feature!

@ashleve
Copy link
Owner

ashleve commented Feb 24, 2023

I slighlty changed formatting for consistency with the template.

Also uncommented the system_tracking_interval and log_system_params since I assume they're set to defaults

configs/logger/aim.yaml Outdated Show resolved Hide resolved
@ashleve
Copy link
Owner

ashleve commented Feb 24, 2023

I noticed that when I do python src/train.py logger=aim the run freezes in terminal after finishing testing.
(the run finishes normally when not using aim)

Do you also experience this issue?

@tesfaldet
Copy link
Contributor Author

So it's not just me. I've been experiencing this as well. I'll have to think about what may be causing this issue. It might be some thread-locking-related issue due to how Lightning, Aim, and Hydra are interacting. Could be a Hydra task function thing, or a Lightning logging thing, or Aim logging... They all potentially use multiple threads IIRC

@ashleve
Copy link
Owner

ashleve commented Feb 24, 2023

Maybe we can add some aim closing call to close_loggers function (if aim exposes any closing method)

def close_loggers() -> None:
"""Makes sure all loggers closed properly (prevents logging failure during multirun)."""
log.info("Closing loggers...")
if find_spec("wandb"): # if wandb is installed
import wandb
if wandb.run:
log.info("Closing wandb!")
wandb.finish()

@tesfaldet
Copy link
Contributor Author

Oh wow I missed that utility function. That would be a perfect place to test that idea out. I believe Aim exposes such a function. Will check later today.

@tesfaldet
Copy link
Contributor Author

tesfaldet commented Feb 25, 2023

An update on this. I found what's causing the issue and it'll take some time to address, but I'm continuing to look into it. There's an infinite loop that gets caused by Aim's run status reporter, where the line log.info(f"Best ckpt path: {ckpt path}") keeps printing to console (invisibly) while the console buffer never gets cleared, which results in the buffer growing infinitely with the same line "Best ckpt path: ...". I think the reason this happens is because of the AimLogger teardown that happens after training is finished followed by the subsequent Aim run (re)creation caused by trainer.test(). I'm not 100% sure, but I do know that your suggestion in explicitly closing any loggers in close_loggers() won't fix this specific issue, unfortunately. However, I have an idea that might make use of close_loggers(), where I remove AimLogger's explicit closing of the run in its finalize() method and instead move that over to close_loggers(), so that when a new AimLogger instance is created for trainer.test() it'll just reuse the same run. That might fix things? I'm following the same logic shown here for the wandb Lightning logger. I'll keep you updated!

This will require me creating a PR for Aim's logger btw. I have a line of communication with the Aim folks so I can try to quickly ask them to verify what's going on before creating a PR for their lightning logger.

@tesfaldet
Copy link
Contributor Author

An update: my hypothesis was correct :) I found a fix and will update this PR soon.

@tesfaldet
Copy link
Contributor Author

Issue created here aimhubio/aim#2550

@tesfaldet
Copy link
Contributor Author

Hi @ashleve! So there's a temporary workaround for the above issue: aimhubio/aim#2550 (comment). However, considering that this issue is not really something that is to be solved by this PR—because it's an issue with Aim and not the PR—and that there is a temporary workaround, would it be okay to just incorporate that workaround into this PR and to continue the thread of conversation of merging this PR? The workaround has the compromise of disabling Aim's ability to capture console logs and show them in the UI (or at least I think so).

src/utils/utils.py Outdated Show resolved Hide resolved
@ashleve
Copy link
Owner

ashleve commented Mar 7, 2023

@tesfaldet from what I understand, for now we just need to add capture_terminal_logs=False to aim config? If that's the case then sure, let's got with that.

@tesfaldet
Copy link
Contributor Author

Yeah I believe so! I haven't tested it out yet so I'll try that later today and make the necessary changes if I verify it works. After that, you can try it on your end to see if it works for you. Then we should be done if it all works out :) I'll be sure to put a comment next to capture_terminal_logs=False pointing to the GH issue so that others know what the compromise is and why it was set to False.

@tesfaldet
Copy link
Contributor Author

@ashleve apologies for the hold-up! It should be ready for you to test out yourself now :)

Copy link
Owner

@ashleve ashleve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tesfaldet That was a long PR, thank you for all the effort you put!

@ashleve ashleve changed the base branch from main to dev March 9, 2023 15:59
@ashleve ashleve merged commit 5fb2d6c into ashleve:dev Mar 9, 2023
@tesfaldet
Copy link
Contributor Author

Thank you for this awesome template! I'm glad I could be of some help :)

ashleve added a commit that referenced this pull request Mar 18, 2023
* Support for logging with Aim (#534)

* Update template to Lightning 2.0 (#548)

* Update pre-commit hooks (#549)

* Refactor utils (#541)

* Add option for pytorch 2.0 model compilation (#550)

* Update `README.md` (#551)

---------

Co-authored-by: Mattie Tesfaldet <[email protected]>
Co-authored-by: Johnny <[email protected]>
@tesfaldet tesfaldet deleted the aim_logger branch May 11, 2023 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants