You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Hasan reported a workflow that was bypassed straight to staging in MSTransferor, and the reason for that is that its pileup dataset already had an input data rule in status REPLICATING. In such cases, we simply consider data to be on the way and advance the workflow to the next status (staging), without keeping record of the rule_ids still unsatisfied. Which causes the same workflow(s) to be bypassed by MSMonitor as well.
If the workflow is marked as TrustPUSitelists=true, then it would be bypassed by global workqueue as well, which could result in many jobs failing to read secondary data in case not all files are available at any given Disk endpoint.
How to reproduce it
Assign a workflow with pileup - with secondary AAA enabled - where the pileup only has rules in INJECT or REPLICATING status.
Expected behavior
I think the best way to address it would be:
if input rule is in status INJECT or REPLICATING, we need to keep that/those rule_id and persist it in the transfer document
in terms of MSTransferor, we do not require any new input rules for such dataset and allow it to move to staging, in case all of the other requirements are met.
Additional context and error message
Some context can be found at: #10041
I think MSMonitor should monitor the staging as if it created the rule(s) itself and apply the same procedure. I'm not sure about the technical difficulty of this, though as the rule was created by some other party.
@haozturk I forgot to mention something, which might make this decision and development much easier.
If we have decided that we should have a pileup sample at every single location that is defined within the campaign configuration - which I think we did a few weeks ago - then we could easily/automatically retrieve the rule_id by trying to create a new rule for one of the locations defined in the campaign. IF it's a new data placement, it will have a new rule_id, otherwise it will find the duplicate rule_id and pass it along to the MSMonitor.
Can you please remind me again what we have decided for pileup samples, on what concerns input data placement (both classic and premix)? Thanks
It sounds good to me. The policy is the following: The pileup should have one full copy at each location defined in the campaign.
Btw, I realized that the staging of this pileup has started w/ this workflow So, MSTransferor has created the rule itself and it's waiting for staging to complete for this workflow as expected. However, it just let other workflows proceed.
Impact of the bug
MSTransferor
Describe the bug
Hasan reported a workflow that was bypassed straight to
staging
in MSTransferor, and the reason for that is that its pileup dataset already had an input data rule in status REPLICATING. In such cases, we simply consider data to be on the way and advance the workflow to the next status (staging), without keeping record of the rule_ids still unsatisfied. Which causes the same workflow(s) to be bypassed by MSMonitor as well.If the workflow is marked as
TrustPUSitelists=true
, then it would be bypassed by global workqueue as well, which could result in many jobs failing to read secondary data in case not all files are available at any given Disk endpoint.How to reproduce it
Assign a workflow with pileup - with secondary AAA enabled - where the pileup only has rules in INJECT or REPLICATING status.
Expected behavior
I think the best way to address it would be:
staging
, in case all of the other requirements are met.Additional context and error message
Some context can be found at: #10041
Here is an example: https://cmsweb.cern.ch/ms-transferor/data/info?request=cmsunified_task_TSG-Run3Summer21wmLHEGS-00035__v1_T_220309_215550_5271
The text was updated successfully, but these errors were encountered: