You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The T1_UK_RAL_Disk RSE has been enabled in the MSUnmerged service configuration, with that a new issue [1] got exposed with the root_failed attribute that we currently use to check whether the scanner succeeded checking the directories or not. Given that there is no scanner for RAL, but only a text file that needs to be downloaded, that key/value attribute does not exist in the WM/stats API output.
This issue has been discussed with Igor M., in the #cms-consistency channel and he suggests to use only the status attribute.
How to reproduce it
Checkout out the T1_UK_RAL_Disk output from WM/stats API
and keep checking RSEs dump status based only on the status attribute (as already coded 2 lines above).
Additional context and error message
Quoting a short summary from the information provided by Igor M.
"""
The status attribute says whether the scanner failed or succeeded. The attribute value can be either “started” or “done” or “failed”.
For regular sites, status="failed" is equivalent to root_failed=true
For RAL it means that the site dump download has failed for this run
“started” means the scanner has not finished yet
For RAL, it means that the site dump is still being downloaded
“done” means it scanned the whole tree, but in case of unmerged scanning, it does not necessarily mean that all subdirectories were scanned successfully.
For RAL it means that the site dump was downloaded successfully
"""
[1]
2021-11-08 01:54:59,148:ERROR:MSUnmerged: plineUnmerged: General error from pipeline. RSE: T1_UK_RAL_Disk. Error: 'root_failed' Will retry again in the next cycle.
Traceback (most recent call last):
File "/data/srv/HG2111a/sw/slc7_amd64_gcc630/cms/reqmgr2ms/0.5.5.pre3/lib/python3.8/site-packages/WMCore/MicroService/MSUnmerged/MSUnmerged.py", line 235, in _execute
pline.run(MSUnmergedRSE(rseName))
File "/data/srv/HG2111a/sw/slc7_amd64_gcc630/cms/reqmgr2ms/0.5.5.pre3/lib/python3.8/site-packages/Utils/Pipeline.py", line 140, in run
return reduce(lambda obj, functor: functor(obj), self.funcLine, obj)
File "/data/srv/HG2111a/sw/slc7_amd64_gcc630/cms/reqmgr2ms/0.5.5.pre3/lib/python3.8/site-packages/Utils/Pipeline.py", line 140, in <lambda>
return reduce(lambda obj, functor: functor(obj), self.funcLine, obj)
File "/data/srv/HG2111a/sw/slc7_amd64_gcc630/cms/reqmgr2ms/0.5.5.pre3/lib/python3.8/site-packages/Utils/Pipeline.py", line 72, in __call__
return self.run(obj)
File "/data/srv/HG2111a/sw/slc7_amd64_gcc630/cms/reqmgr2ms/0.5.5.pre3/lib/python3.8/site-packages/Utils/Pipeline.py", line 75, in run
return self.func(obj, *self.args, **self.kwargs)
File "/data/srv/HG2111a/sw/slc7_amd64_gcc630/cms/reqmgr2ms/0.5.5.pre3/lib/python3.8/site-packages/WMCore/MicroService/MSUnmerged/MSUnmerged.py", line 406, in consRecordAge
isRootFailed = self.rseConsStats[rseName]['root_failed']
KeyError: 'root_failed'
The text was updated successfully, but these errors were encountered:
Impact of the bug
MSUnmerged
Describe the bug
The T1_UK_RAL_Disk RSE has been enabled in the MSUnmerged service configuration, with that a new issue [1] got exposed with the
root_failed
attribute that we currently use to check whether the scanner succeeded checking the directories or not. Given that there is no scanner for RAL, but only a text file that needs to be downloaded, that key/value attribute does not exist in theWM/stats
API output.This issue has been discussed with Igor M., in the #cms-consistency channel and he suggests to use only the
status
attribute.How to reproduce it
Checkout out the T1_UK_RAL_Disk output from WM/stats API
Expected behavior
Remove that check on
root_failed
:https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/MicroService/MSUnmerged/MSUnmerged.py#L406
and keep checking RSEs dump status based only on the
status
attribute (as already coded 2 lines above).Additional context and error message
Quoting a short summary from the information provided by Igor M.
"""
The
status
attribute says whether the scanner failed or succeeded. The attribute value can be either “started” or “done” or “failed”.For regular sites, status="failed" is equivalent to root_failed=true
For RAL it means that the site dump download has failed for this run
“started” means the scanner has not finished yet
For RAL, it means that the site dump is still being downloaded
“done” means it scanned the whole tree, but in case of unmerged scanning, it does not necessarily mean that all subdirectories were scanned successfully.
For RAL it means that the site dump was downloaded successfully
"""
[1]
The text was updated successfully, but these errors were encountered: