Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix T0Request update function to avoid unneeded documents update #11353

Merged
merged 2 commits into from
Oct 31, 2022

Conversation

amaltaro
Copy link
Contributor

@amaltaro amaltaro commented Oct 28, 2022

Fixes #11304

Status

ready

Description

This PR fixes the T0Request couchapp update function, ensuring that documents transitioning from/to the same request status are not updated. In other words, if a workflow needs to be updated from status AAA to AAA, the update function will simply skip the document update (hence, the document will keep the same revision number as it had before the update call).

In addition to that, improve the TaskArchiver log record such that we can see what was the outcome of the update call.

Documentation: https://docs.couchdb.org/en/3.2.2-docs/ddocs/ddocs.html#update-functions

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

None

External dependencies / deployment changes

In order to apply this patch to the T0 agent, the following set of commands is required:

  1. Patch the couchapps area:
curl https://patch-diff.githubusercontent.com/raw/dmwm/WMCore/pull/11353.patch | patch -d apps/t0/data/couchapps/ -p 3
  1. Patch the python code area
curl https://patch-diff.githubusercontent.com/raw/dmwm/WMCore/pull/11353.patch | patch -d apps/t0/lib/python3.8/site-packages/ -p 3
  1. Push the up-to-date CouchApps (before starting all the components, but with CouchDB running):
$manage execute-agent wmagent-couchapp-init

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 tests no longer failing
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 2 warnings and errors that must be fixed
    • 1 warnings
    • 20 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13689/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 6 warnings and errors that must be fixed
    • 3 warnings
    • 36 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 1 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13690/artifact/artifacts/PullRequestReport.html

@amaltaro amaltaro changed the title Debugging abusive writes to t0_request db Fix T0Request update function to avoid unneeded documents update Oct 28, 2022
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 2 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 2 warnings
    • 16 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 1 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13691/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

test this please

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 2 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 2 warnings
    • 16 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 1 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13692/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 1 tests no longer failing
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 7 warnings and errors that must be fixed
    • 2 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 2 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13694/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

test this please

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 7 warnings and errors that must be fixed
    • 2 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 2 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13695/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

Thanks to @germanfgv for running a replay with these changes in, we can see that documents are no longer continuously and unneded updated. Checking one workflow in local couchdb:

cmst0@vocms0500:/data/tier0/srv/wmagent/current $ curl localhost:5984/t0_request_local/Repack_Run360886_StreamALCALowPtJet_Tier0_REPLAY_2022_ID221031125321_v31125321
{"_id":"Repack_Run360886_StreamALCALowPtJet_Tier0_REPLAY_2022_ID221031125321_v31125321","_rev":"5-3b0fe29894435c6d45ead9a88baa1b6f","RequestName":"Repack_Run360886_StreamALCALowPtJet_Tier0_REPLAY_2022_ID221031125321_v31125321","Run":360886,"RequestStatus":"completed","RequestTransition":[{"Status":"new","UpdateTime":1667217240},{"Status":"Closed","UpdateTime":1667217287},{"Status":"Merge","UpdateTime":1667217800},{"Status":"completed","UpdateTime":1667218159}]}

the revision number is sound (5 updates in total), while with the buggy code it would be beyond 15 or so by now.

@amaltaro amaltaro requested review from germanfgv and vkuznet October 31, 2022 13:20
}
if (doc.RequestStatus && !isAllowedTransiton(doc.RequestStatus, newStatus)) {
// don't update the status just ignore
return "not allowed transition " + doc.RequestStatus + " to " + newStatus;
return "Not allowed transition, from " + doc.RequestStatus + " to " + newStatus;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always suggest to put quotes around status'es that it will increase readability especially if status is not integer data-type. For instance, if status is some sentences, e.g. 'status is ok' then reading full string will become more clear that status is shown in quotes.

Fix T0 updaterequest update function to avoid same status transition

apply Valentins suggestions
another update to unit tests
@amaltaro
Copy link
Contributor Author

@vkuznet Valentin, the javascript source code looks uglier, but it's easier on the client perspective. I updated the error message as you suggested. Please have another look.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 7 warnings and errors that must be fixed
    • 2 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 2 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13697/artifact/artifacts/PullRequestReport.html

@amaltaro amaltaro merged commit c97395a into dmwm:master Oct 31, 2022
@amaltaro
Copy link
Contributor Author

@germanfgv as discussed during the WMCore meeting today. I would recommend to apply this patch to the T0 agents, it does require a stop of the components, stop of couch, apply the couchapps patch, start of couch, push up-to-date couchapps and start all the components. Please let us know via Slack if you need help with any of this.

germanfgv added a commit to dmwm/T0 that referenced this pull request Nov 2, 2022
germanfgv added a commit to germanfgv/T0 that referenced this pull request Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate why t0_request CouchDB database grows at a rate of 2GB/h
3 participants