-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Registry info cleanup #1022
Comments
That is currently the expected behaviour as it is hard to define if a file actually was removed or just disappeared for some time (renamed, volume not available, ...). We discussed a solution for this some time ago here: https://github.com/elastic/filebeat/issues/269 Is the issue you have that the registrar file is growing over time? |
In my case log file name policy is based on date:
So day by day registry file grow and I need some tecnique to purge old info in a safe way |
We should definitively provide a method to cleanup the registrar file in the long run. As you have only one entry per day you shouldn't run into issues too soon. |
Good |
@smaniero I suggest to keep this issue open and closing https://github.com/elastic/filebeat/issues/269 so we keep track of it in this repo. |
Ok |
I would suggest the following solution: As in the next release, close_older is introduced and ignore_older is set to infinity by default, I suggest to use ignore_older for the cleanup of the registrar file. That means every time a file hits ignore_older it will also be removed from the registrar file. By default that never happens as ignore_older is infinity. Is ignore_older set to 12h, every file older then 12hours will be removed from the registrar file. Should it be updated afterwards again, the reading would start at the end of the file as no state is persisted. Would that work for you @vinod8427 @smaniero |
This is a feature I am anxiously awaiting. We have several high volume log sources that feature hourly log slices which are created with a time-stamp suffix. We have no control over the file names, but we need to ship them regardless. The registry quickly grows into the MBs, and we need to knock it back. This involves shutting down the filebeat daemon, hand modifying the file, and starting it back up. Not the highlight of my week. Yes, we'll end up automating this, but still... Since nobody replied to your last comment, I will. It seems like a reasonable approach to me. One point that could use some clarification... You mention that the file will be purged from the registry if is older than ignore_older. The filebeat registry (at least for 1.1.2) does not appear to track any timestamps related to the file inode, implying that you would need access to the file to determine its age. If the file has been removed (inode gone... not simply moved to a new filename), would you also consider this as eligible for removal? Has the format already been changed in 1.2.x to support this? The discussion you linked to above (https://github.com/elastic/filebeat/issues/269) seems to indicate that a timestamp would be added for this purpose, but wanted to confirm. It is entirely likely that we will have files that are purged from the host before the configured ignore_older window, and we wouldn't want those to continue to linger around. |
This goes way back to forwarder days. The purpose of registry is to maintain the current file system snapshot (inside, name, etc) and record progress (offset.) Currently, it is not doing the former and only receives updates that imply an addition. I've not fully tested things but based on the fixes I created for Log Courier a long time ago I'd say some of the following still remains and hopefully it'll offer some guidance.
Generally I would suggest the following to help nip this one in the bud. All are in the registrar/prospector code for Log Courier and work really well from what I hear.
I suppose overall the idea is that registrar should reflect the prospector view state, plus harvester offsets. At the moment it only truly reflects harvester states - yet it is used during startup to seed the prospector state. So you can imagine after restart the prospector state is very often completely different to what it was before restart. |
@driskell That makes a lot of sense to me. |
Linking this issue here as I think if we start to refactor the registrar file and potentially have to break BC, we should also take something like this into account: jordansissel/ruby-filewatch#79 (comment) Not necessarly directly implement it in filebeat but having the structure of the registry file ready to support more then just inode and device. |
@driskell Thanks a lot for the detailed analysis. There is not much to add and it is definitively the direction we should follow. The only point I'm not sure about is your last point about removing a file from registrar if it is not found after a scan. Couldn't this have the affect that assuming a volume disappears for a second and reappears again, that it will reread all files? I was thinking using ignore_older for the cleanup based on a timestamp in the registry file. Like this it could be handled directly by the "registry writer" and harvester / prospector would not have to deal with it. |
@ruflin - as part of my fingerprint update in filewatch I took the opportunity to add an expires timestamp into the sincedb record, when an expired entry is read it is not put into the in-memory db and therefore is not written out when the db is persisted to the sincedb file. |
@ruflin Disappearing volumes could cause issue I guess - I had not considered that. If file system disappears then it likely all closed files. Depending on close timing settings if files vanished and were seen again it would see them as too old to open (ignore older) though of course if you close earlier it might do that. Guess it is a tuning ppl may need to do as longer the record kept the more likely inode conflict on sets of fast rotating files. Both edge cases really :) |
Before there were 2 different channels to report state so also prospector and registrar could report the state. State is now always reported by events from the harvesters. This means that events can also be empty to only update state. This means if 3 events were processed, this does not necessarily mean that 3 events were sent, as some of the events can also be state information. * Introduction of channel to prospector for events. This allows for actions on the event on a prospector level * Cleanup variable naming Part of elastic#1022
* Simplify state handling in Filebeat Before there were 2 different channels to report state so also prospector and registrar could report the state. State is now always reported by events from the harvesters. This means that events can also be empty to only update state. This means if 3 events were processed, this does not necessarily mean that 3 events were sent, as some of the events can also be state information. * Introduction of channel to prospector for events. This allows for actions on the event on a prospector level * Cleanup variable naming Part of #1022 * Prevent race in state
The Registrar should only persist the state but should not have to fetch information from the file itself before writing the state. First this is not the responsibility of the registrar to have knowledge about the different harvester implementation, second the information of a file could already have changed when the registrar writes. With this PR the file state is decoupled from the Registrar. Currently there is duplicated information in FileEvent. This will be cleaned up in a second step to keep the changes as minimal as possible per PR. Related to elastic#1022
The Registrar should only persist the state but should not have to fetch information from the file itself before writing the state. First this is not the responsibility of the registrar to have knowledge about the different harvester implementation, second the information of a file could already have changed when the registrar writes. With this PR the file state is decoupled from the Registrar. Currently there is duplicated information in FileEvent. This will be cleaned up in a second step to keep the changes as minimal as possible per PR. Related to #1022
Previously the file state was loaded from disk every time a new file was found. The state from the registry file is now only loaded once during startup and from the one the prospector internal in memory state is used. This is more efficient and prevents race conditions. This can have one side affect. In case a shared file system is not available during a restart and becomes available later, all files are read from the start again as the state for these files was not loaded. Only the state for files which are found during startup (rotated or not) are loaded. This is related to elastic#1022 Changes: * Registry only loaded once during startup * File Stats are only read once and reuse during one scan * No state for empty files is written * Additional tests to confirm changes were added
Previously the file state was loaded from disk every time a new file was found. The state from the registry file is now only loaded once during startup and from the one the prospector internal in memory state is used. This is more efficient and prevents race conditions. This can have one side affect. In case a shared file system is not available during a restart and becomes available later, all files are read from the start again as the state for these files was not loaded. Only the state for files which are found during startup (rotated or not) are loaded. This is related to #1022 Changes: * Registry only loaded once during startup * File Stats are only read once and reuse during one scan * No state for empty files is written * Additional tests to confirm changes were added
Brief update here: Quite a few changes were implemented in Filebeat recently to tackle the problems above (see above linked commits / PR's). Registrar is not decoupled from Prospector, Registrar stores it state as an array without the path as index and registrar is only used for the state on startup. There are still a few open issues as #1600 but things are moving forward. |
@ruflin will this feature be possible support with filebeat-1.2(cleanup registry states of deleted files )? |
@zxy12 Probably this feature will only be available in 5.x. The reason is that to make this feature available the inner workings of prospector and registrar had to be heavily changed. |
@ruflin Well, I will upgrade it. |
I'm going to close this issue as all the necessary changes to separate prospector and registrar state handling are now in master. In addition new clean_* variables were introduce to clean up the registry file either on removal or after a certain time. All the discussions and related PR's can be found in #1600. Some small changes related to naming and docs are still tracked in #2012. All these changes should be release with beta1 soonish. Thanks everyone for contributing in this issue. Special thanks goes to @driskell as most of the work done is very similar on what he initially suggested. Please test the new features and let us know if you find some issues. |
I am currently experiencing this inode reuse issue in a production system using filebeat v1.2.3. Can someone advise me if the only current solution is to upgrade to v5.0.0 alpha5? Will the clean_* variables be backported to a release version as this is a pretty critical issue? |
@doingitbigdata Backporting the clean_* options to the 1.x release is not possible because of the inner workings of filebeat and the registry file structure in 1.x. Your only way is to upgrade to the 5.0.0 releases. Beta1 is coming soon if that helps. |
Why when log file is deleted registry file entries isn't updated? (remove keys of deleted files)
The text was updated successfully, but these errors were encountered: