-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Easily access datasets on Rucio data lake #156
Comments
Hello @matbun, After speaking with few people at CERN, there are two "main" way to interact with RUCIO data.
Option 1 takes much more time that option 2. Furthermore, you would need to keep an internet connection open during the whole download. Therefore, we should go with option 2. I can already create a small bash script for VEGA that simlinks all the dataset files into a txt file, that we would need to adapt for each of the data centers. Step by step ;-). Let me know where I can add this script within |
I have created a new tutorial folder on a new branch: https://github.com/interTwin-eu/itwinai/tree/156-easily-access-datasets-on-rucio-data-lake/tutorials/data-lake/pull-dataset @garciagenrique could you please add an example of "option 2" with some documentation? The goal is giving such example to the interTwin use cases, so that they can reproduce it for their datasets. Perhaps a couple of links to Rucio docs would help as well. Thanks! |
Hey @garciagenrique, I have updated the issue description with what we discussed yesterday and with some suggestions on where to create the python module and tests |
Add a Python function capable of translating a namespaced Rucio dataset/file to the absolute path on the local filesystem of the datacenter (e.g., HPC) on which the code is currently running.
Sth like
namespace_to_path('jdoe:physics_dataset')
returning:'/dacache/slling.si/.../physics_dataset'
when on HPC1'/other/path/.../physics_dataset'
when on HPC2The dataset can or cannot be on the HPC:
How to proceed;
rucio.py
module undersrc/itwinai/
to store the python function meant to convert a rucio dataset to the absolute path on the local RSEtest_rucio.py
file undertests/
Once this is done, we will integrate it with other itwinai modules (e.g., config parser and CLI)
The text was updated successfully, but these errors were encountered: