-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bulk requests stuck on STARTED state #7744
Comments
Hi Elena. Make sure you set:
(max retries 3 kind of means "smaller number", you do not want to have this number to be large) |
As for currenr request - I suggest to cancel it via bulk admin api. |
Hi Dmitry, |
This is how it works:
All staging requests in dCache are handled in |
Runnig dCache version 9.2.25 and Enstore.
When a recall is failed due to a problem reading the tape, not due a missing file, checksum error, etc. the bulk request remain STARTED, with RUNNING state on the file, and without any "rh" request on any pool.
Example:
[dccore12] (local) admin > \s bulk request ls 2eda9e78-fbaa-4f5a-b285-787dc7f29bec
ID | ARRIVED | MODIFIED | OWNER | STATUS | UID
3182252 | 2025/02/04-10:40:14 | 2025/02/04-10:40:14 | 31101:1399 | STARTED | 2eda9e78-fbaa-4f5a-b285-787dc7f29bec
dccore12] (local) admin > \s bulk request info 2eda9e78-fbaa-4f5a-b285-787dc7f29bec
2eda9e78-fbaa-4f5a-b285-787dc7f29bec:
status: STARTED
arrived at: 2025-02-04 10:40:14.781
started at: 2025-02-04 10:40:14.793
last modified at: 2025-02-04 10:40:14.793
target prefix: /
targets:
CREATED | STARTED | COMPLETED | STATE | TARGET
2025-02-04 10:40:14.782 | 2025-02-04 10:40:14.782 | ? | RUNNING | /pnfs/pic.es/data/cms/store/data/Run2024G/ZeroBias/AOD/PromptReco-v1/000/384/202/00000/f58968b7-5890-4970-abcc-b5ace5d645e5.root
2025-02-04 10:40:14.782 | 2025-02-04 10:40:14.782 | 2025-02-04 10:40:14.808 | FAILED | /pnfs/pic.es/data/cms/store/test/loadtest/source/T1_ES_PIC_Tape/urandom.270MB.file0000 -- (ERROR: diskCacheV111.util.CacheException : File not on tape.)
[dccore12] (local) admin > \sn pnfsidof /pnfs/pic.es/data/cms/store/data/Run2024G/ZeroBias/AOD/PromptReco-v1/000/384/202/00000/f58968b7-5890-4970-abcc-b5ace5d645e5.root
0000AFE37C9682A641AE99358012553B0CE8
$ echo "\s dc* rh ls"| ssh -p 22224 dccore.pic.es|grep 0000AFE37C9682A641AE99358012553B0CE8
In this example after running \bulk request reset the rh process doesn't appear, but when we faced the problem the first time, the new rh for the stuck file, was launched.
Don't hesitate on request any information you need.
Cheers,
Elena
The text was updated successfully, but these errors were encountered: