-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The plugin is not retrying on specific errors and dropping the data on error 400 #134
Comments
This is why Fluentd provides secondary mechanism to prevent data losses. This backup chunks are able to restore with fluent-logger-ruby: |
Thanks for this, I can check it and maybe implement it as a current workaround. But shouldn't the plugin keep retrying? This is the reason why I have set a buffer of 60GB, in case of pushing issues to accumulate the data on the buffer till the Opensearch being fixed. |
No, it shouldn't. Because there is no recovering mechanism to handle the error. 400 error is sometimes really hard to resolve when trying to attempt resending. Perhaps, specifying retry_tag might fit in for your case: This is because Fluentd's retrying mechanism is too coupled for associated conditions. This is a reason we choose to give up resending when 400 status is occurred. |
Are you suggesting a workflow that starts with an Input, then moves through a Filter, into Opensearch Output Plugin, and utilizes the Secondary File Output Plugin (leveraging the retry tag for matching), followed by a manual script execution as outlined here (https://groups.google.com/g/fluentd/c/6Pn4XDOPxoU/m/CiYFkJXXfAEJ?pli=1)? How do we ensure the file doesn't grow excessively large without implementing some form of rotation? I appreciate this as a temporary solution, thank you. It would be ideal to have a more comprehensive, automated solution supported by Fluentd and its plugins, eliminating the need for manual intervention across 100 AKS Clusters and avoiding the necessity for additional developer resources. I understand this is a complex issue. An optimal solution would allow for configurations through flags such as enable_retry_on_400 with customizable retry durations, for example, a maximum of 10 days or even unlimited. |
There is no automated solution for this case. There are quite various cases to consider how to handle retrying mechanism and reemit into another data pipeline. |
(check apply)
Steps to replicate
He had a case that the max open shards had been reached in Openeasrch so fluentd was getting an error
Error:
The error is ok and expected, but we did not expect to lose the data.
Once we increased the maximum number of open shards in opensearch the old logs were never being pushed.
Looks like the fluentd is not retrying on error 400 and dropping data.
We want to not lose the data due to some temporary misconfiguration on the Opensearch or if there is some limit being reached.
Configuration
Expected Behavior or What you need to ask
We expected the data to be stored in the buffer and retry till it was successful and not lose the data. How to achieve that when getting similar errors?
Using Fluentd and OpenSearch plugin versions
Ubuntu
Kubernetes
Fluentd
fluentd 1.16.2
OpenSearch plugin version
fluent-plugin-opensearch (1.1.4)
opensearch-ruby (3.0.1)
OpenSearch version
v 2.10.0
The text was updated successfully, but these errors were encountered: