Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(producer): add retry buffer tuning option to prevent OOM #3026

Merged
merged 1 commit into from
Dec 19, 2024

Conversation

wanwenli
Copy link
Contributor

@wanwenli wanwenli commented Dec 5, 2024

This commit adds an optional configuration to Sarama's retry mechanism to limit the size of the retry buffer. The change addresses issues #1358 and #1372 by preventing unbounded memory growth when retries are backlogged or brokers are unresponsive.

Key updates:

  • Added Producer.Retry.MaxBufferLength configuration to control the maximum number of messages stored in the retry buffer.
  • Implemented logic to handle overflow scenarios, ensuring non-flagged messages are either retried or sent to the errors channel, while flagged messages are re-queued.

This enhancement provides a safeguard against OOM errors in high-throughput or unstable environments while maintaining backward compatibility (unlimited buffer by default).

buf length 10k

Additionally, as demonstrated in the graph above, the retry buffer’s length is effectively constrained, confirming that the new feature functions as intended.

@wanwenli
Copy link
Contributor Author

wanwenli commented Dec 5, 2024

@dnwe @puellanivis @rtreffer @klauspost I’ve submitted a PR addressing the retry buffer OOM issue (#1358, #1372) by introducing an optional configuration to limit the retry buffer’s size. The changes aim to prevent unbounded memory growth while maintaining backward compatibility (unlimited buffer by default). Your review and feedback would be greatly appreciated!

async_producer.go Outdated Show resolved Hide resolved
async_producer.go Show resolved Hide resolved
async_producer.go Show resolved Hide resolved
@wanwenli wanwenli force-pushed the prevent-producer-retry-buffer-oom branch 2 times, most recently from a0f5c5b to fc4fc68 Compare December 6, 2024 04:17
async_producer.go Outdated Show resolved Hide resolved
async_producer.go Outdated Show resolved Hide resolved
@wanwenli wanwenli force-pushed the prevent-producer-retry-buffer-oom branch from fc4fc68 to cdba19d Compare December 6, 2024 08:26
This commit adds an optional configuration to Sarama's retry mechanism to limit the size of the retry buffer.
The change addresses issues IBM#1358 and IBM#1372 by preventing unbounded memory growth when retries are backlogged or brokers are unresponsive.

Key updates:
- Added `Producer.Retry.MaxBufferLength` configuration to control the maximum number of messages stored in the retry buffer.
- Implemented logic to handle overflow scenarios, ensuring non-flagged messages are either retried or sent to the errors channel, while flagged messages are re-queued.

This enhancement provides a safeguard against OOM errors in high-throughput or unstable environments while maintaining backward compatibility (unlimited buffer by default).

Signed-off-by: Wenli Wan <[email protected]>
@wanwenli wanwenli force-pushed the prevent-producer-retry-buffer-oom branch from cdba19d to fbf1d5b Compare December 6, 2024 08:33
Copy link
Contributor

@puellanivis puellanivis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t see any concerns. Though, I don’t know how appropriate a minimum functional retry buffer of 4 KiB would be. (I just knew it should be a power of 2 for being more allocator friendly.)

FYI: I cannot actually “approve” or even “resolve conversation”.

@dnwe dnwe added the needs-investigation Issues that require followup from maintainers label Dec 10, 2024
Copy link
Collaborator

@dnwe dnwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wanwenli thanks for submitting your PR and thanks @puellanivis for the review and giving useful early feedback. Looking over the changes I'm happy with the implementation and it provides a useful tuning mechanism for backpressure. Thanks again!

@dnwe dnwe added the feat label Dec 19, 2024
@dnwe dnwe merged commit 7c75cc4 into IBM:main Dec 19, 2024
14 checks passed
@dnwe dnwe changed the title feat: Prevent retry buffer in AsyncProducer to go OOM feat(producer): add retry buffer tuning option to prevent OOM Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat needs-investigation Issues that require followup from maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants