-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filebeats not closing "old" file handles on Windows #922
Comments
You understand ignore_older correct. All files older should not have an open file handler anymore. Can you share your config? Also if possible it would be nice to have some debug output (enabled with |
Thanks ruflin, Config and debug log attached. Notice the file request-2016-02-01.log being correctly ignored, but request-2016-02-02.log is loaded and kept open even though it is too old. This happens because it is still referenced in the registry (I manually removed the other one). |
Hi, i have a similar (the same?) problem on Linux. We ship a huge amount of logs, so they are rotated frequently. For a while all seems well, but at some point Filebeat doesn't close old files anymore, causing the handles to stay around and thus preventing the disk space from being free'd. I tried using I will try to extract some meaningful debug output, but it's not easy given the amount logs we ship (and thus the amount of debug output produced). We just read a single file and output to Logstash. Could this possibly be related to either Filebeat or the receiving Logstash not being able to keep up with the amount of data to process? |
@sebbarg Thanks a lot for these config files. I tried to create a summary here:
I'm somehow surprised by the last two lines as it seems it is not harvesting on the second last line and then has a few seconds later. It somehow gives the impression that two harvesters are running on the same file. As the log file is quite big, I could also have missed some stuff. This could potentially also explain what @bitfehler describes, that it happens under high load. Be aware, this is only a theory so far. @bitfehler It would be great if you could provide some log files here. In case one of you has the opportunity to also test the nightly that would be great. Quite a bit of refactoring happened in master to prevent potential race conditions. Nightly can be found here: https://beats-nightlies.s3.amazonaws.com/index.html?prefix=filebeat/ I will investigate further in the code to see if I can find the potential issue. |
@ruflin running latest nightly now. I'll update when I know more. |
@ruflin I will try out the nightly later today. In the meantime, I managed to reproduce the issue with a reasonable amount of logs by using Some more context: we are using Filebeat 1.1.0 to monitor a single file that is written and rotated by svlogd, maybe this is related to their specific implemetation of rotating files, but I am not aware of any super-special things they would be doing. Here is our config file and the log. Also, for completeness, this is the output of Let me know if there is anything else I can supply, I will get back after testing the nightly build. Thanks a lot! |
@ruflin Sorry to report that the issue still exists for me with the latest 1.2.0 nightly build from ~2 hours ago. Let me know if there is any further data I can supply to help track this down... |
One more thing. To not make you read it, this is the relevant part from the
During peak time, current is rotated ever 1-2 minutes. We keep 10 files, so the rotated files are deleted by Note that this also happens during off-peak, so even the file is rotated every 3-5 minutes for example. I will try to make a minimal reproduction case for this with small files, but not sure if that will work... |
@bitfehler Thanks for digging deeper here. When you tried the nightly, did you use One thing that caught my attention reading the above is the changing of the permissions. This should not change the file identifiers, but I have to test it on my side. An other related issue could be https://github.com/elastic/filebeat/issues/271 I'm somehow surprised the by error log lines for force_close_file as they show nil but it should only be printed if error is not nil. Have to check this in more detail. For my understanding: Are no files closed or just some are kept open? |
@ruflin I did indeed use |
Hi,
Unfortunately, same result... |
@bitfehler What is your OS? |
I have two different setups where this happening:
and
|
@bitfehler Do you get in your logs something like the following when you have close_older enabled? Did you remove the ignore_older config part?
I will try to do some more tests on debian. |
Two more things:
|
I will try changing the pattern and then share the logs. One thing I am not sure about: in the nightly build, shoud I still be using |
If you use the pattern * it shouldn't be needed. In both cases it should work without it, but it would be interesting to see if it solves the issue. |
The problem is still occurs with the following config:
Here is the log file. At the time the log ends there are seven open handles to deleted files. There are no lines like the one you mentioned above in #922 (comment), but if I understand correctly that is because I only run with |
Ran again with |
Forgot to add: this time, by the time the log ends there were two open handles. |
I followed this file:
As you have set closer_older to 1 minute and above 5 minutes passed, it should already have been closed. Your note that two handles are open somehow suggests that two harvesters were started for the file (which I can't find in the log files). Are your log files on any shared or special drive? |
@bitfehler Sorry to ask, but could you provide me with one more log file with "harvester, prospector" enable, force_close_files: true, close_older to 15 seconds, harvesting only the "current" file (no glob). And can you "clean / remove" your .registrar file before starting again? |
@ruflin The logs are on a plain disk. But I did indeed not clear the registry file, forgot about that. I guess that could have caused some weirdness? Anyways, I now ran the following config:
This time, I stopped the service, deleted the registry file, and restarted it. The issue still occured. This is the resulting log: For reference, these stales file handles showed up in
Hope that helps... |
Summary of the findings:
So it really looks like close_older does nothing. That probably means that lastTimeRead has a wrong date inside: https://github.com/elastic/beats/blob/master/filebeat/harvester/reader.go#L114 An alternative could be that for whatever reason Is the above your complete config file or did you leave parts out (except for the hosts list)? I'm currently refactoring part of the prospector and harvester and try to introduce some more log messages to see what exactly is going on: #956 |
The config I pasted is all there was. I tried to build filebeat myself, so I could just add some log lines, but unfortunately it didn't build out of the box:
Will happily run new builds or different configs. In the meantime, I have an idea for a potential reproduction case that I will I try to implement... |
In the meantime I did some system-tests trying to reproduce it. So far no success: #958 Can you tell me more about your use case? Can you try to clone the repo from the repository itself instead of using go get? Or if you use go get try the full beats repo (not only filebeat). |
Hi, sorry for not responding for a while, I was pretty busy yesterday. I don't have any updates on the issue at hand so far, but I did finally manage to compile Filebeat myself. The problem was that this commit golang/text@c27e06c will break your builds once you upgrade your golang.org/x/text to the latest version, so be warned 😄 As for the use case, it's rather simple. The ampelmann-logs service is a simple runit service. It receives access logs from an HTTP proxy and writes them to stdout, where they are picked up by svlogd, a common scenario in the runit world. Svlogd takes care of writing the logs to disk and rotating files. I have tried to reproduce this with a very simple runit service that just spits out a log line every couple of seconds, but the issue didn't occur. Thus I am wondering if this is related to the amount of logs being processed? Maybe Filebeat still needs to do some work when the file is closed and that triggers some kind of race condition? |
Unfortunately it is not compatible with the most recent version of golang/text, that is why we tagged a specific commit: https://github.com/elastic/beats/blob/master/glide.yaml#L31 I'm currently working on improving the tracking of open harvesters and also the logging around it. This will make it possible to track much better what is happening: #964 Two additional ideas:
|
Cool, I will see if can get your branch to run and give that a go and will also try the file output! |
Be aware that this is not stable yet, still WIP. Interesting is the httpprof output with the harvester counter. If you start it with the httpprof option you can load in your browser how many harvesters are running. |
@bitfehler @sebbarg Quite some time passed since our last interfaction and a few new filebeat versions with improvements came out (current is 1.2.2). Is this still an ongoing issue? I actually hope with all the recent improvements that we fixed this issue. So I will close this issue. Please reopen if this issue still persists. |
This issue still persists. I had a log rolled over on 25th Aug 2016 and on 27th Aug 2016 when i started Filebeat it started reading the older one as well. |
@mrunalgosar This is issue is related to file handlers not being closed. Your issue seems to be that ignore_older does not work as expected. Can you please open a topic on discuss to investigate your issue? https://discuss.elastic.co/c/beats/filebeat Please post your config and if possible some logs files there. |
Hi,
I'm on Windows 2012R2 / NTFS / filebeat 1.1.0.
From the log:
2016-02-04T09:12:25Z INFO Set ignore_older duration to 24h0m0s
Once a file has been recorded in the "registry" the above setting has /no/ effect for files with modified timestamps older than 24hrs.
If I manually delete the file entry in the registry and restart filebeat, too-old files are correctly ignored (i.e. no handles to them exists).
You can verify this by running Sysinternal's handle.exe (https://technet.microsoft.com/en-us/sysinternals/handle) in both scenarios.
Have I misunderstood the setting "ignore_older" ?
Best,
Sebastian
The text was updated successfully, but these errors were encountered: