[receiver/filelog] Allow configuration of ingestion rate #18476

cpheps · 2023-02-09T01:32:04Z

Component(s)

receiver/filelog

Is your feature request related to a problem? Please describe.

The changes in #12910 allow the filelog receiver to continuously consume files it has matched on during a single polling cycle in batches.

It was discovered that in an environment with a high number of static log files and start_at set to beginning the filelog receiver would consume them all as fast as it could in a single polling cycle. While all these files do need to be processed, the rate at which they are processed can out pace the rate at which the rest of the pipeline can process and export them. This can lead to memory ballooning in the filelog receiver as there are entire files in memory waiting to move down the pipeline. We've also observed it causing API rate violations for destination platforms as a high number of requests are being sent from exporters as all logs were consumed at once. Even with varying batch sizes the batches using the batch processor were immediately filled and sent to the exporter due to the large number of log records waiting in the pipeline.

Describe the solution you'd like

I would like to propose a configuration value that would allow a user to specify the maximum number of files that can be processed in a single polling cycle. This value should be able to be ignored/turned off in cases that the current behavior of maximum consumption is desired.

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2023-02-09T01:32:23Z

Pinging code owners:

receiver/filelog: @djaglowski

See Adding Labels via Comments if you do not have permissions to add labels yourself.

cpheps · 2023-02-09T02:54:22Z

I created a draft PR above with a proposed solution and new parameter poll_file_limit.

Here's the general idea of how I envisioned the new parameter working:

poll_file_limit can be 0 to indicate unlimited
poll_file_limit must be greater than or equal to max_concurrent_files to allow at least one full batch
poll_file_limit will be enforce once at least that many files have been consumed. This is to make it easy to configure for the user without having to worry about correlating the value as a multiple of the batch size. It won't allow more than one full batch past the limit to be consumed.

djaglowski · 2023-02-09T22:30:45Z

Closed by #18477

cpheps added enhancement New feature or request needs triage New item requiring triage labels Feb 9, 2023

github-actions bot added the receiver/filelog label Feb 9, 2023

cpheps mentioned this issue Feb 9, 2023

[receiver/filelog] Limit files consumed per polling cycle #18477

Merged

djaglowski closed this as completed Feb 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[receiver/filelog] Allow configuration of ingestion rate #18476

[receiver/filelog] Allow configuration of ingestion rate #18476

cpheps commented Feb 9, 2023

github-actions bot commented Feb 9, 2023

cpheps commented Feb 9, 2023

djaglowski commented Feb 9, 2023