Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSTransferor: rule ids for INJECT/REPLICATING rules must be passed to MSMonitor #11035

Closed
amaltaro opened this issue Mar 10, 2022 · 3 comments · Fixed by #11061
Closed

MSTransferor: rule ids for INJECT/REPLICATING rules must be passed to MSMonitor #11035

amaltaro opened this issue Mar 10, 2022 · 3 comments · Fixed by #11061

Comments

@amaltaro
Copy link
Contributor

amaltaro commented Mar 10, 2022

Impact of the bug
MSTransferor

Describe the bug
Hasan reported a workflow that was bypassed straight to staging in MSTransferor, and the reason for that is that its pileup dataset already had an input data rule in status REPLICATING. In such cases, we simply consider data to be on the way and advance the workflow to the next status (staging), without keeping record of the rule_ids still unsatisfied. Which causes the same workflow(s) to be bypassed by MSMonitor as well.

If the workflow is marked as TrustPUSitelists=true, then it would be bypassed by global workqueue as well, which could result in many jobs failing to read secondary data in case not all files are available at any given Disk endpoint.

How to reproduce it
Assign a workflow with pileup - with secondary AAA enabled - where the pileup only has rules in INJECT or REPLICATING status.

Expected behavior
I think the best way to address it would be:

  • if input rule is in status INJECT or REPLICATING, we need to keep that/those rule_id and persist it in the transfer document
  • in terms of MSTransferor, we do not require any new input rules for such dataset and allow it to move to staging, in case all of the other requirements are met.

Additional context and error message
Some context can be found at: #10041

Here is an example: https://cmsweb.cern.ch/ms-transferor/data/info?request=cmsunified_task_TSG-Run3Summer21wmLHEGS-00035__v1_T_220309_215550_5271

@haozturk
Copy link

I think MSMonitor should monitor the staging as if it created the rule(s) itself and apply the same procedure. I'm not sure about the technical difficulty of this, though as the rule was created by some other party.

@amaltaro
Copy link
Contributor Author

@haozturk I forgot to mention something, which might make this decision and development much easier.

If we have decided that we should have a pileup sample at every single location that is defined within the campaign configuration - which I think we did a few weeks ago - then we could easily/automatically retrieve the rule_id by trying to create a new rule for one of the locations defined in the campaign. IF it's a new data placement, it will have a new rule_id, otherwise it will find the duplicate rule_id and pass it along to the MSMonitor.

Can you please remind me again what we have decided for pileup samples, on what concerns input data placement (both classic and premix)? Thanks

@haozturk
Copy link

It sounds good to me. The policy is the following: The pileup should have one full copy at each location defined in the campaign.

Btw, I realized that the staging of this pileup has started w/ this workflow So, MSTransferor has created the rule itself and it's waiting for staging to complete for this workflow as expected. However, it just let other workflows proceed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants