-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adopt MSPileup data into PileupFetcher #12197
Conversation
Jenkins results:
|
Jenkins results:
|
test this please |
Jenkins results:
|
I am running the following testbed tests (in vocms0193) and hopefully the failing unit tests have now been sorted as well. I fear that to properly test it though, we will need to have an agent connected to production Rucio and MSPileup. @hassan11196 I might have to ping you tomorrow to inject the workflow that failed as a backfill (in a backfill agent..) |
Jenkins results:
|
The unit test failing:
seems to be unrelated (maybe these CVMFS is no longer mounted in these jenkins nodes?):
@d-ylee are you aware of any changes that might have impacted this? If I am not wrong, at some point we asked Shahzad to mount cvmfs in all Jenkins nodes, no? Even though I am still to check my tests in the agent, I think these changes are ready for a review. |
Just pushed in a 5th commit which is supposed to resolve failures like:
which happens because testbed/preprod MSPileup talks to Integration Rucio, meaning the custom containers MSPileup don't exist in Production Rucio. Nonetheless, it also fails with the Integration database, maybe because the integration with Rucio Int in WM is very incomplete and fragile... We need to revisit this ASAP. |
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Together with Ahmed, we have been validating these changes in a backfill agent and the workflow injected is: I still see a high failure rate, but the problem is because this workflow was injected with secondary AAA enabled, so it will run everywhere in the site list - and not only at FNAL and CERN I have copied parts of this workflow above, from the WorkQueueManager cache area, under the following directory:
and indented the following 2 files (located at
My observations are:
Final summary: despite problems beyond the "scope" of this ticket, I think the changes provided in this PR bring us to the expected and healthy behavior. TODO: create a new ticket for observation 3) above. |
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
My previous commits were all on the mock data and emulator, instead of changing any of the logical implementation. |
Jenkins results:
|
A snippet of the WorkQueueManager logs in the backfill agent (vocms0254) are:
so it is all looking good to me. |
'filters': ['pileupName', 'customName', 'containerFraction', 'currentRSEs']} | ||
doc = getPileupDocs(msPileupUrl, queryDict, method='POST')[0] | ||
msg = f'Pileup dataset {doc["pileupName"]} with:\n\tcustom name: {doc["customName"]},' | ||
msg += f'\n\tcurrent RSEs: {doc["currentRSEs"]}\n\tand container fraction: {doc["containerFraction"]}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a relevant question!
We do not need to perform any mapping here because MSPileup (and Rucio) are already returning a PhEDEx Node Name (aka RSE).
And what is expected during runtime is the PNN, as it is loaded from the Site Local Config:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMRuntime/Scripts/SetupCMSSWPset.py#L378
So we are good.
@amaltaro thank you for working on this. This looks great to me, I have a minor comment for your consideration. Then, I am fine with approving the PR |
Thank you, Andrea. I will improve the docstring for |
fix import Fix PileupFetcher logging and kwargs Support different Rucio urls in PileupFetcher - according to the DBS instance Custom pileup requires a custom scope as well another fix for custom pileup name Remove blocks available in DBS but not in Rucio improve docstring
fix scope in MakeRucioMockFile.py
update Rucio mocked data fix signature of getBlocksInContainer in Rucio data
more unit test fixes
Jenkins results:
|
Fixes #12195
Status
ready
Description
Summary of changes is:
getDomainName
to parse the cmsweb endpoint urlPileupFetcher
with MockMSPileuppileupconf.json
fileIs it backward compatible (if not, which system it affects?)
YES
Related PRs
None
External dependencies / deployment changes
None