-
-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker build cache on some machines is getting large (quickly) and exhausting space #3007
Comments
I do not think this is a new problem per se, as there is a history of issues where we hit this, some of that history summarized somewhat by this one: #2510). In terms of new dev level tests, none are currently being run regularly. dev.openjdk includes openjdk container tests, though those should not be dispatched to static docker hosts (as they required sw.tool.docker label which should not be on any of the static docker hosts). And in any case, we disabled those in January, presumably temporarily for the release, since currently release test list and weekly test lists are entwined (use the same config file, which needs to be changed to allow us to run dev level testing on weekends but not worry about it getting run during releases). dev.system tests which include jcstress tests but are also currently not running regularly, though I would like to enable them for weekend runs. |
Most of the previous issues came down to the size of certain containers so this warrants a new issue compared to the old one. Based on your comment is it just the external tests that are running that would be creating new docker containers from docker files and so resulting in things going into the cache from a test perspective at present? We also have https://ci.adoptium.net/job/openjdk_build_docker_multiarch/ running on a regular basis to keep the "old" images secure. It may be that re-enabling that (run 870 towards the end of January) has contributed to an increase in cache space. |
I've run
|
Output from today. A number of the machines have had a significant increase in the build cache size:
|
I'm running a check at 5 minute intervals on one of the machines that's currently running the docker_build_multiarch job (ampere2 from the above list).
So it's chewing up close to 20GB extra on each run of that job on that machine. I've also enabled ampere1 for these jobs and build jobs and taken ampere2 offline for now in order to evaluate whether it exhibits the same behaviour (It's Ubuntu 22.04 instead of 20.04 so will be a different docker version. As a point of note, when running a new build job (which pulls down our docker build image) no additional Build Cache space is used. |
Today's output (compare with five days ago):
TL;DR up to 200Gb extra have been consumed on some of the machines. |
It's now behaving, other than on arm32 where for some odd reason if the number is larger than something just over 2.1GB you get e.g. Fixed with an armv7l specific check in the code (I could have just skipped it since the arm32 machines aren't having a problem with this), but the output on the other machines looks good:
|
Recent issues:
Between the last two issues there was about 2 weeks where the cache on the machine got up to 86Gb after being cleared out. We should look at understanding what is making it increase so much, whether it is expected or a problem, and the most appropriate way to mitigate it (cache's are generally helpful, but would we see use from a regular
docker builder prune
with a size limit on it e.g.docker builder prune -f --keep-storage 24000000000
to keep a couple of Gb.@smlambert your input would be appreciated as this is on test systems - has something happened recently that has dramatically increased the amount of cache space that docker would be using (Perhaps the dev suites)? We've so far only ran into the issue on x64 and s390x but those may well just have been the first ones we've hit.
The text was updated successfully, but these errors were encountered: