Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send alerts to prometheus when a workflow fails to be processed. #10959

Merged
merged 1 commit into from
Feb 15, 2022

Conversation

khurtado
Copy link
Contributor

Fixes #10215

Status

In development

Description

Send alerts to prometheus when a workflow fails to be processed.

Is it backward compatible (if not, which system it affects?)

YES

@cmsdmwmbot
Copy link

Jenkins results:

  • Python2 Unit tests: succeeded
  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python2 Pylint check: failed
    • 5 warnings and errors that must be fixed
    • 5 warnings
    • 36 comments to review
  • Python3 Pylint check: succeeded
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12729/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python2 Unit tests: succeeded
  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python2 Pylint check: failed
    • 1 warnings and errors that must be fixed
    • 5 warnings
    • 36 comments to review
  • Python3 Pylint check: succeeded
  • Pylint py3k check: failed
    • 1 warnings
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12730/artifact/artifacts/PullRequestReport.html

@khurtado khurtado requested a review from amaltaro January 31, 2022 15:04
Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kenyi, these changes are looking good. However, I think these alerts will be better organized if we actually have a wrapper function/method for each of them. We might even want to create a new module called something like MSTransferorAlerts and encapsulate all this information in there.

If you feel like refactoring the notifyLargeData method as well, that would be even better :-D

What do you think?

@khurtado
Copy link
Contributor Author

@amaltaro : Okay, I encapsulated the alerts in different methods. Could you please check the second commit?

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12773/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 4 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12774/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are looking much better Kenyi! I left a few comments along the code for your consideration though (I think I made single comments instead of a review, sorry!).

@khurtado
Copy link
Contributor Author

These changes are looking much better Kenyi! I left a few comments along the code for your consideration though (I think I made single comments instead of a review, sorry!).

Alright! Just made the changes.

@amaltaro
Copy link
Contributor

@khurtado this is looking good now. Can you please squash those 3 commits into a single one? Thanks

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 3 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12787/artifact/artifacts/PullRequestReport.html

@khurtado
Copy link
Contributor Author

@amaltaro Done!

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12788/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor

Thanks Kenyi!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create monitoring or alerts for workflows failing in MSTransferor
3 participants