Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update PileupFetcher to use Rucio to build the list of replicas and block location #9894

Closed
amaltaro opened this issue Sep 7, 2020 · 6 comments · Fixed by #9930
Closed

Comments

@amaltaro
Copy link
Contributor

amaltaro commented Sep 7, 2020

Impact of the new feature
WMAgent

Is your feature request related to a problem? Please describe.
PileupFetcher uses PhEDEx to build a dictionary of:

  • blocks and their current location
  • blocks and a list of all their files/replicas

as we decommission PhEDEx, the same functionality needs to be ported to Rucio.

Describe the solution you'd like
Support Rucio within PileupFetcher, such that the pileup dataset location can be performed through Rucio.
This means, we need to implement:

  • given a container name, return all its blocks and their current location (where data available and locked by WMCore)
  • given a container name, return all its blocks and their files/replicas

The piece of code building the list of files need to be reviewed though. We fetch the list from DBS, which then gets overwritten by the list of files retrieved from PhEDEx (likely taking care of the invalid/no-longer-existent files).
It's also not clear to me why we need to know the amount of events in each block..

Describe alternatives you've considered
Not resolve the list of files in Rucio, rely on that information from DBS (and maybe filter by validOnly).

Additional context
none

@amaltaro
Copy link
Contributor Author

I'm starting to think that it would be easier (and compatible with our wishes for the future) to resolve the pileup location differently than how we resolve all the other data used in central production.

So, instead of

  • listing all the replication rules for the container and all its blocks (list_replication_rules)
  • resolving the rse expressions
  • resolving the multi RSE rules with single copy
  • and finally listing the final dataset locks (and compare them against the rules), thus get_dataset_locks

I think we could simply resolve the current block location (no matter who is responsible for locking data) with a list_dataset_replicas_bulk, as done by:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/Steps/Fetchers/PileupFetcher.py#L96

I guess pileup won't be placed by anyone than either the DM team, together with CompOps, or MSTransferor (well, WMAgent as well, from the block-level locks when data is getting produced).

@nsmith- would you have any preference here? Or perhaps you can see a reason not to do something that I fail to?

@amaltaro
Copy link
Contributor Author

Actually, I implemented the first option (checking locks and availability).

@nsmith-
Copy link

nsmith- commented Sep 22, 2020

I think at some point in the future we would like to not have any manual placement of pileup required. In that sense, probably the original option is better long-term.

@amaltaro
Copy link
Contributor Author

@nsmith- sorry Nick, are you referring a simple block location lookup as the original option? Or the one which considers locks made by the transferor account?

@amaltaro
Copy link
Contributor Author

If we do not want any manual placement, I understand that means MSTransferor would be in charge of the pileup data placement as well. Given that the PR has been tested, I'm proceeding with the lock-based approach. I'm happy to change it in the near future though if we find it wasn't the best choice. Thanks

@nsmith-
Copy link

nsmith- commented Sep 23, 2020

Yeah I mean lock-based, sorry it was unclear from the comment. I think you have what is best already in the code then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants