Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/filelog] Allow configuration of ingestion rate #18476

Closed
cpheps opened this issue Feb 9, 2023 · 3 comments
Closed

[receiver/filelog] Allow configuration of ingestion rate #18476

cpheps opened this issue Feb 9, 2023 · 3 comments
Labels
enhancement New feature or request needs triage New item requiring triage receiver/filelog

Comments

@cpheps
Copy link
Contributor

cpheps commented Feb 9, 2023

Component(s)

receiver/filelog

Is your feature request related to a problem? Please describe.

The changes in #12910 allow the filelog receiver to continuously consume files it has matched on during a single polling cycle in batches.

It was discovered that in an environment with a high number of static log files and start_at set to beginning the filelog receiver would consume them all as fast as it could in a single polling cycle. While all these files do need to be processed, the rate at which they are processed can out pace the rate at which the rest of the pipeline can process and export them. This can lead to memory ballooning in the filelog receiver as there are entire files in memory waiting to move down the pipeline. We've also observed it causing API rate violations for destination platforms as a high number of requests are being sent from exporters as all logs were consumed at once. Even with varying batch sizes the batches using the batch processor were immediately filled and sent to the exporter due to the large number of log records waiting in the pipeline.

Describe the solution you'd like

I would like to propose a configuration value that would allow a user to specify the maximum number of files that can be processed in a single polling cycle. This value should be able to be ignored/turned off in cases that the current behavior of maximum consumption is desired.

Describe alternatives you've considered

No response

Additional context

No response

@cpheps cpheps added enhancement New feature or request needs triage New item requiring triage labels Feb 9, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@cpheps
Copy link
Contributor Author

cpheps commented Feb 9, 2023

I created a draft PR above with a proposed solution and new parameter poll_file_limit.

Here's the general idea of how I envisioned the new parameter working:

  • poll_file_limit can be 0 to indicate unlimited
  • poll_file_limit must be greater than or equal to max_concurrent_files to allow at least one full batch
  • poll_file_limit will be enforce once at least that many files have been consumed. This is to make it easy to configure for the user without having to worry about correlating the value as a multiple of the batch size. It won't allow more than one full batch past the limit to be consumed.

@djaglowski
Copy link
Member

Closed by #18477

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs triage New item requiring triage receiver/filelog
Projects
None yet
Development

No branches or pull requests

2 participants