-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agents going into drain because of unwanted disk partitions are above threshold #12139
Comments
It seems to be that the |
I looked at the container mounts and noticed that inside the container For example on vocms0283 the output of
|
why do we have a problemA couple of additional pointers (maybe these are known to everybody, but future dario will thank me): data in wmstats is displayed at the following line only if WMCore/src/couchapps/WMStats/_attachments/js/Views/HTMLList/WMStats.AgentDetailList.js Line 24 in f3c25d8
that information comes from AgentStatusPoller
that uses WMCore/src/python/Utils/Utilities.py Line 96 in f3c25d8
As Hassan described, Since we bind mount from the vm host into the docker container multiple directories that in the host are inside what can we do to solve itSo, this is an unexpected regression of deploying the WMCore with docker containers. The cleanest solution that comes to my mind is to always identify a partition by the "filesystem" column and not by the "mounted on", chaning in Utilities.py - diskPercent.append({'mounted': split[5], 'percent': split[4]})
+ diskPercent.append({'mounted': split[0], 'percent': split[4]}) so that we consistently report the "/data" directory with "/dev/vdb". Otherwise, the internet suggests to bind mount the root director of the host in readonly [3], but i really dislike this approach. In any case, I would like to hear a quick opinion from @todor-ivanov , who i trust with all these docker shenanigans :) anatomy of [0] (on the host)
[1] (inside the container, vocms0281)
[2] (inside the container, vocms0256)
|
hi @mapellidario your explanation is complete and good. About:
I definitely would not want to enlarge with yet another item my list of things to regret about before I meet the Reaper. About this though:
Your suggestion is one way to go, but in this case people will always see only the volume as reaching the threshold, which is indeed true and fair. Another way to go would be simply to visualize all mount points including the duplicate mounts: in WMCore/src/python/Utils/Utilities.py Line 96 in f3c25d8
the following change should do the job:
The choice should be up to whom ever takes the issue to work on. Both will do. |
@mapellidario can you please take this issue on? I intent no pressure, but if we can have this development by the end of this week, we can already consider it in the upcoming WMAgent release. |
Impact of the bug
WMAgent
Describe the bug
We have a couple of agents that were automatically set to drain mode because one of their disk partitions is above the threshold set in the agent (set to 85% utilization at the moment).
How to reproduce it
Just fill up any partition in the node above 85%.
Expected behavior
We need to either disable those specific partitions in the configuration, such that the agent stops monitoring and acting on them.
Or we should actually create an allowed list of partitions to be monitored (this would be a larger change though).
Additional context and error message
Examples:
and
The text was updated successfully, but these errors were encountered: