Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBS/Rucio data injection synchronization with workflow completion #8148

Open
vlimant opened this issue Sep 7, 2017 · 12 comments
Open

DBS/Rucio data injection synchronization with workflow completion #8148

vlimant opened this issue Sep 7, 2017 · 12 comments

Comments

@vlimant
Copy link
Contributor

vlimant commented Sep 7, 2017

Impact of the bug
WMAgent

Describe the bug
Depending on how loaded an agent is, it could be that it takes up to a couple of days to inject data into DBS and/or Rucio. This is specially confusing for workflows that have been recently moved to completed status, as data can have been fully injected a few minutes after the transition, or it can take hours to do so, or even days.

How to reproduce it
Not clear what exactly triggers it.

Expected behavior
A good data injection commitment would be to have a dedicated handling of completed workflows (or workflows where all the agent subscriptions have been marked as completed), such that DBS3Uploader and RucioInjector expedite their data injection ahead of anything else already available in the database.

Once those data have been properly injected, the components can proceed with their normal operations.

It is clear that the asynchronization is still there, but provided that everything is stable and functional, it should be a matter of < 2 hours to get all the data available in Rucio and DBS.

Additional context and error message
An alternative would be not to mark workflow subscriptions as done unless the relevant output data has been successfully injected into DBS and Rucio. However, this would make it challenging to identify which workflows need to have an expedite data injection, given that there is no communication between the components other than through object state stored in the local relational database.

@ticoann ticoann self-assigned this Sep 8, 2017
@ticoann ticoann added this to the WMAgent1709 milestone Sep 8, 2017
@ticoann
Copy link
Contributor

ticoann commented Sep 8, 2017

I thought Unified checks this. Anyway, we will try to add this feature by next release

@vlimant
Copy link
Contributor Author

vlimant commented Sep 8, 2017

yes, unifies check for phedex/dbs inconsistency, and sometimes there is something that needs to be taken care of with transfer team. The point is that if there can systematically be such inconsistency, one has to wait n (= how much?) hours before checking request in "completed" status before checking and acting on the inconsistency. In short, there is no way to know if it's just a delay or just files missing/invalidated in the wild

@vlimant
Copy link
Contributor Author

vlimant commented Oct 17, 2017

https://its.cern.ch/jira/browse/CMSCOMPPR-1361 for a use-case where having the synchronisation is mandatory to make sense of the "completed" status

@bbockelm
Copy link
Contributor

@vlimant - is this still high-priority? It has lingered for an awfully long time.

@amaltaro
Copy link
Contributor

I'll dump whatever I have to do in October and get this one fixed. Or I close it and we say we can't fix it and we need to live with this forever.

@amaltaro amaltaro removed this from the WMAgent1905 milestone Sep 27, 2021
@klannon klannon added Stakeholders Technical Debt Used to track issues that address technical needs internal to WM team and removed Stakeholders Technical Debt Used to track issues that address technical needs internal to WM team labels Sep 29, 2021
@amaltaro amaltaro removed their assignment Sep 22, 2023
@klannon
Copy link

klannon commented Sep 22, 2023

@amaltaro I'm impressed that this issue from 2017 (!) appears in our workplan. Given that this refers to PhEDEx and might include references to other outdated concepts, perhaps you could spend 60 seconds writing a brief updated description at the top (with a new title too, perhaps?) so that a modern audience appreciates the intentions here?

@amaltaro
Copy link
Contributor

@klannon you are right, apologies for not getting earlier to this. I have just refactored the original issue description.

I wanted to note though that the P&R team does not consider this issue important for Q4, they are actually interested in #11729 (and of course, to no longer have file mismatch in WM, which is a very generic problem). Said that, I am removing it from the 2023/Q4 board.

@amaltaro amaltaro changed the title asynchronisation "completed" and PhedexInjector DBS/Rucio synchronization with workflow completion Sep 22, 2023
@amaltaro amaltaro changed the title DBS/Rucio synchronization with workflow completion DBS/Rucio data injection synchronization with workflow completion Sep 22, 2023
@klannon
Copy link

klannon commented Sep 22, 2023

Fair enough. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ToDo
Development

No branches or pull requests

6 participants