-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CouchDB] New CouchDB 3.x APIs for tracking replications; fix replication check in AgentStatusPoller #11039
Conversation
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Instead of blocking this development while we wait for the final integration of CouchDB 3.x, I decided to make it compatible with both CouchDB versions, such that we can merge it sooner than later. |
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @amaltaro
It looks good to me.
… in AgentStatusPoller Make Couch/replication check compatible between different versions of CouchDB fix getActiveTasks call fix isReplicationOK method; evaluate stale replication in the last hour change replication checks once again
fix unit tests update unit tests
Jenkins results:
|
Despite testing it a few times. I deployed the latest changes today - with a fresh setup - and noticed that elements were not getting replicated from central workqueue to the agent workqueue_inbox. I might be wrong, but I think there might be a race condition between deleting the "old" replication documents and creating the new ones, around this code: I'm going to provide at least a better logging for this, in the open PR #11001 |
@amaltaro , I had a look at https://github.com/dmwm/WMCore/blob/master/src/python/WMComponent/AgentStatusWatcher/AgentStatusPoller.py#L100-L106 and indeed you have racing conditions there since your request goes to the queue and you never check if was processed. What you should do is the following:
|
Thank you for looking into this, Valentin.
which causes the replication task to be eventually removed from the active tasks (and retried). I do think the deletion/creation replication task code can be made more robust though. Anyhow, I need to debug which configuration parameters I changed over the last days/weeks, because this is definitely something that I tested and which was working(!) |
Fixes #11037
Status
ready
Tested with WMAgents running with both CouchDB versions.
Description
This PR provides the following fixes and/or new features:
source_seq
is no longer an integer in CouchDB 3.x)/
), including its versionCouchMonitor
class, the following has been implemented:getActiveTasks
client API to retrieve the active tasks from the_active_tasks
CouchDB endpointgetSchedulerJobs
client API to retrieve jobs from the_scheduler/jobs
CouchDB endpointgetSchedulerDocs
client API to retrieve documents from the_scheduler/docs
CouchDB endpointIs it backward compatible (if not, which system it affects?)
YES
Related PRs
Related to #11001, but no longer requiring it.
External dependencies / deployment changes
Changes required for CouchDB 3.x, but it can be merged before that version gets integrated in our stack.