Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wmstats - consistent reporting of wmagent disk use #12140

Merged
merged 2 commits into from
Oct 17, 2024

Conversation

mapellidario
Copy link
Member

Fixes #12139

Status

Tested. I run this code inside a wmagent docker container and i got the expected output [1].

Description

the output of df is

(WMAgent-2.3.5) [cmst1@vocms0262:~]$ df -klP
Filesystem     1024-blocks    Used Available Capacity Mounted on
/dev/vdb         257837508 3991660 240722264       2% /etc/group

At the moment in WMStats we show the value of the latest column "mounted on", which is not meaningful nor consistent across agents. The first column "filesystem" contains the name of the partition that we care about and is consistent both between the docker container and the host and across different hosts.

This PR adds the parsing of the first column and shows that string in wmstats

Is it backward compatible (if not, which system it affects?)

YES: I did not remove the "mounted on" column, so existing code will continue to work as is

Related PRs

none

External dependencies / deployment changes

none


[1]

(WMAgent-2.3.5) [cmst1@vocms0262:~]$ vim Utilities.py
(WMAgent-2.3.5) [cmst1@vocms0262:~]$ python
Python 3.8.16 (default, May 23 2023, 14:26:40)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import Utilities
>>> Utilities.diskUse()
[{'filesystem': 'overlay', 'mounted': '/', 'percent': '22%'}, {'filesystem': 'tmpfs', 'mounted': '/dev', 'percent': '0%'}, {'filesystem': 'shm', 'mounted': '/dev/shm', 'percent': '0%'}, {'filesystem': '/dev/vda1', 'mounted': '/tmp', 'percent': '22%'}, {'filesystem': '/dev/vdb', 'mounted': '/etc/group', 'percent': '2%'}, {'filesystem': 'tmpfs', 'mounted': '/proc/acpi', 'percent': '0%'}, {'filesystem': 'tmpfs', 'mounted': '/proc/scsi', 'percent': '0%'}, {'filesystem': 'tmpfs', 'mounted': '/sys/firmware', 'percent': '0%'}]

@cmsdmwmbot
Copy link

Can one of the admins verify this patch?

@cmsdmwmbot

This comment was marked as outdated.

Copy link
Contributor

@todor-ivanov todor-ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good.
@mapellidario just a note, for your consideration, not to request a change. This:

we show the value of the latest column "mounted on", which is not meaningful nor consistent across agents

Is because df command does not display duplicated mounts by default until explicitly told with the -a option. Which is the case for docker bind mounts starting from the same volume - in our case /dev/vdb. And the inconsistency between agents is only because df shows only the first mount, which basically at startup time succeeds on a random basis, so one never knows which of the so bound mounts would be visualized.

@mapellidario
Copy link
Member Author

I do not want to use the -a option otherwise we would report that N disks are getting full when only one is indeed

(WMAgent-2.3.5) [cmst1@vocms0262:~]$ df -klPa | grep vdb
/dev/vdb         257837508 3999808 240714116       2% /etc/passwd
/dev/vdb         257837508 3999808 240714116       2% /etc/group
/dev/vdb         257837508 3999808 240714116       2% /data/certs
/dev/vdb         257837508 3999808 240714116       2% /data/admin/wmagent
/dev/vdb         257837508 3999808 240714116       2% /data/srv/wmagent/2.3.5/logs
/dev/vdb         257837508 3999808 240714116       2% /data/srv/wmagent/2.3.5/config
/dev/vdb         257837508 3999808 240714116       2% /data/srv/wmagent/2.3.5/install

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dario, these changes look good to me.
However, can you please enhance the relevant unit test to give us better coverage (more meaningful test):
https://github.com/dmwm/WMCore/blob/master/test/python/Utils_t/Utilities_t.py#L167
?

@mapellidario
Copy link
Member Author

Changing the unit tests would mean parsing some output that is specific to the platform that is running the unittest (linux host, vm, container, osx, ...) and it is not easy to come up with some common pattern that is both meaningful and does not break on some machines, which i think was the reason behind that very generic unittest. I will give it a try, but i do not promise anything

@cmsdmwmbot

This comment was marked as outdated.

@mapellidario
Copy link
Member Author

So, Utils.Utilities.diskUse() in a jenkins node that runs our tests return the following output [1]


[1]

[
  {
    "filesystem": "devtmpfs",
    "mounted": "/dev",
    "percent": "0%"
  },
  {
    "filesystem": "tmpfs",
    "mounted": "/dev/shm",
    "percent": "0%"
  },
  {
    "filesystem": "tmpfs",
    "mounted": "/run",
    "percent": "11%"
  },
  {
    "filesystem": "tmpfs",
    "mounted": "/sys/fs/cgroup",
    "percent": "0%"
  },
  {
    "filesystem": "/dev/vda1",
    "mounted": "/",
    "percent": "76%"
  },
  {
    "filesystem": "/dev/vda15",
    "mounted": "/boot/efi",
    "percent": "3%"
  },
  {
    "filesystem": "cvmfs2",
    "mounted": "/cvmfs/cvmfs-config.cern.ch",
    "percent": "82%"
  },
  {
    "filesystem": "cvmfs2",
    "mounted": "/cvmfs/cms.cern.ch",
    "percent": "82%"
  },
  {
    "filesystem": "cvmfs2",
    "mounted": "/cvmfs/cms-ib.cern.ch",
    "percent": "82%"
  },
  {
    "filesystem": "cvmfs2",
    "mounted": "/cvmfs/grid.cern.ch",
    "percent": "82%"
  },
  {
    "filesystem": "cvmfs2",
    "mounted": "/cvmfs/projects.cern.ch",
    "percent": "82%"
  },
  {
    "filesystem": "cvmfs2",
    "mounted": "/cvmfs/unpacked.cern.ch",
    "percent": "82%"
  },
  {
    "filesystem": "tmpfs",
    "mounted": "/run/user/501",
    "percent": "0%"
  }
]

@mapellidario
Copy link
Member Author

I added a new set of checks in that unit test, i am satisfied :)

@mapellidario mapellidario requested a review from amaltaro October 11, 2024 12:25
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 28 comments to review
  • Pylint py3k check: failed
    • 4 errors and warnings that should be fixed
  • Pycodestyle check: succeeded
    • 21 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15294/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me, thanks Dario.

NOTE: with this change, we will have to push new couchapps in in the central CouchDB instance(s) when an upgrade time comes.

@amaltaro amaltaro merged commit 393357b into dmwm:master Oct 17, 2024
1 of 4 checks passed
@mapellidario mapellidario deleted the 20241010_fix_12139 branch October 28, 2024 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Agents going into drain because of unwanted disk partitions are above threshold
4 participants