You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The changes in #12910 allow the filelog receiver to continuously consume files it has matched on during a single polling cycle in batches.
It was discovered that in an environment with a high number of static log files and start_at set to beginning the filelog receiver would consume them all as fast as it could in a single polling cycle. While all these files do need to be processed, the rate at which they are processed can out pace the rate at which the rest of the pipeline can process and export them. This can lead to memory ballooning in the filelog receiver as there are entire files in memory waiting to move down the pipeline. We've also observed it causing API rate violations for destination platforms as a high number of requests are being sent from exporters as all logs were consumed at once. Even with varying batch sizes the batches using the batch processor were immediately filled and sent to the exporter due to the large number of log records waiting in the pipeline.
Describe the solution you'd like
I would like to propose a configuration value that would allow a user to specify the maximum number of files that can be processed in a single polling cycle. This value should be able to be ignored/turned off in cases that the current behavior of maximum consumption is desired.
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
I created a draft PR above with a proposed solution and new parameter poll_file_limit.
Here's the general idea of how I envisioned the new parameter working:
poll_file_limit can be 0 to indicate unlimited
poll_file_limit must be greater than or equal to max_concurrent_files to allow at least one full batch
poll_file_limit will be enforce once at least that many files have been consumed. This is to make it easy to configure for the user without having to worry about correlating the value as a multiple of the batch size. It won't allow more than one full batch past the limit to be consumed.
Component(s)
receiver/filelog
Is your feature request related to a problem? Please describe.
The changes in #12910 allow the filelog receiver to continuously consume files it has matched on during a single polling cycle in batches.
It was discovered that in an environment with a high number of static log files and
start_at
set tobeginning
the filelog receiver would consume them all as fast as it could in a single polling cycle. While all these files do need to be processed, the rate at which they are processed can out pace the rate at which the rest of the pipeline can process and export them. This can lead to memory ballooning in the filelog receiver as there are entire files in memory waiting to move down the pipeline. We've also observed it causing API rate violations for destination platforms as a high number of requests are being sent from exporters as all logs were consumed at once. Even with varying batch sizes the batches using the batch processor were immediately filled and sent to the exporter due to the large number of log records waiting in the pipeline.Describe the solution you'd like
I would like to propose a configuration value that would allow a user to specify the maximum number of files that can be processed in a single polling cycle. This value should be able to be ignored/turned off in cases that the current behavior of maximum consumption is desired.
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: