-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
file handle exhaustion when fetching from remote cache #13435
Comments
Can you try whether it is reproducible with 4.1.0rc4 where we made some improvement to the channel pool? |
This happens with 4.1.0rc4 as well. I should add that @jablin and I use remote caching and |
I confirm the observations of @obruns . What's more: In order to reproduce the problem, you have to have an empty |
I can't reproduce the bug. My setup is 50000 When the
When the Can you please share a more concrete repro? |
I haven't succeeded in reproducing the problem in a "synthetic" project yet.
I don't think so. Why do you think that 1024 file handles are objectively insufficient? I think they are not: On many distributions, a resource limit of 1024 file handles is the default. I still think that the error that you encountered is one more (strong) hint that the disk cache code hogs hogs hundreds of file handles. |
We can't assume the error from my setup is caused by file handle leaks. The setup was intended to have the files opened as many as we can: these 50000
Since the setup is not the shape of builds in general, I wouldn't bother too much about file handle exhaustion in this case. However, I agree with you if the error become more frequent among builds in which case we could probably use a write queue. On the other hand, having file handle leak is a serious bug and should be fixed. |
I've monitored the file handle usage of the bazel process during my build:
It's easy to see when the "out of file handles" occurs. |
I've sampled in more detail:
I one of the samples, out of 4095 open files, 3922 were writing to the local disk cache. In the end, the files that Bazel was writing to the local disk cache (according to |
I've added logging for around opening and closing files in file Apparently,
I still don't know how to reproduce the problem without my original project. Sorry. However, here's the crucial part of a (working!) fix. Instead of trying to fix (b) by tweaking the execution I've found a simple solution to fix (a).
|
Re-use the existing "LazyFileOutputStream" in DiskAndRemoteCacheClient.java in order to avoid "Too many open files". Resolves bazelbuild#13435
… lazily Re-use the existing "LazyFileOutputStream" in DiskAndRemoteCacheClient.java in order to avoid "Too many open files", probably triggered by build steps with > 1k output files Resolves bazelbuild#13435
…"too many open files" Re-use the existing "LazyFileOutputStream" in DiskAndRemoteCacheClient.java in order to avoid "Too many open files". Resolves bazelbuild#13435 Closes bazelbuild#13574. PiperOrigin-RevId: 379892227
…"too many open files" Re-use the existing "LazyFileOutputStream" in DiskAndRemoteCacheClient.java in order to avoid "Too many open files". Resolves bazelbuild#13435 Closes bazelbuild#13574. PiperOrigin-RevId: 379892227
…"too many open files" Re-use the existing "LazyFileOutputStream" in DiskAndRemoteCacheClient.java in order to avoid "Too many open files". Resolves bazelbuild#13435 Closes bazelbuild#13574. PiperOrigin-RevId: 379892227
…"too many open files" Re-use the existing "LazyFileOutputStream" in DiskAndRemoteCacheClient.java in order to avoid "Too many open files". Resolves bazelbuild#13435 Closes bazelbuild#13574. PiperOrigin-RevId: 379892227
…"too many open files" Re-use the existing "LazyFileOutputStream" in DiskAndRemoteCacheClient.java in order to avoid "Too many open files". Resolves bazelbuild#13435 Closes bazelbuild#13574. PiperOrigin-RevId: 379892227
… "too many open files" Re-use the existing "LazyFileOutputStream" in DiskAndRemoteCacheClient.java in order to avoid "Too many open files". Resolves bazelbuild/bazel#13435 Closes #13574. PiperOrigin-RevId: 379892227
Description of the problem
When extensively using a remote cache (i.e. 100% of the build results of a large project),
bazel build
(4.0.0) hits "too many file handles" if you use an empty--disk_cache
at the same time.This is reproducible. On an 8 core CPU (+ hyperthreading) it usually happens after ~4k targets.
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
This eventually leads to:
What operating system are you running Bazel on?
RHEL 7.4 (kernel 3.10.0)
What's the output of
bazel info release
?release 4.0.0
Have you found anything relevant by searching the web?
I have been told to modify
/etc/systemd/system.conf
: setDefaultLimitNOFILE=524288
,systemctl daemon-reload
and reboot.Any other information, logs, or outputs that you want to share?
ulimit -Sn
from (default) 1024 to 4095 does not help at allworker_connections 512;
The text was updated successfully, but these errors were encountered: