-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix JobAccountant missing parent dbsbuffer file #9997
Conversation
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Current patch resolves this issue at JobAccountant level (extra protection for the parent identification) and WMBSHelper (scanning all the |
0a7ef05
to
e08aa6f
Compare
update WMBSHelper logic; add check for store/unmerged in addition to merged remove debugging statements
e08aa6f
to
87fe078
Compare
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
First commit has been applied to submit7, which is now running without any issues. |
Still needs a thorough testing, but perhaps you can already spot any mistake here, Todor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
This is likely a good candidate to go into the upcoming WMAgent release, such that we can avoid such nasty bugs in the future. Will spend some more time validating it today/tomorrow. |
I have ran many tests with this patch and everything seems to be working fine, including the parentage data uploaded to the ACDC server and what gets inserted into DBSBuffer when acquiring ACDC elements. There is one possible problem though with TaskChains with >= 3 consecutive KeepOutput=False. The problem gets exposed either with or without this patching, thus moving on with this one. I'm still discussing to see exactly what's the expected behaviour with data parentage for such cases and will open a new GH issue if needed. |
Sigh... should have squashed those commits! |
Fixes #9456
Status
In development
Description
From the GH issue, there are cases where an ACDC collection has the incorrect information regarding the parent files.
This PR provides two level of protection (still missing the root cause though, likely on the ErrorHandler component):
parents
parameter, and only add those that will be actually referenced as a parentmerged
flag, also check the lfn nameUPDATE:
Second commit will ensure that we only provide parent files that have been merged (thus, only parent files that are needed in the execution of the ACDC workflow, to track the output parentage).
Is it backward compatible (if not, which system it affects?)
yes
Related PRs
none
External dependencies / deployment changes
none