-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AsyncProducer retries causing OOM #1358
Comments
me too!!!
|
Thanks for the error report @hagen1778 |
Same error was also reported in #1372 |
Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur. |
This commit adds an optional configuration to Sarama's retry mechanism to limit the size of the retry buffer. The change addresses issues IBM#1358 and IBM#1372 by preventing unbounded memory growth when retries are backlogged or brokers are unresponsive. Key updates: - Added `Producer.Retry.MaxBufferLength` configuration to control the maximum number of messages stored in the retry buffer. - Implemented logic to handle overflow scenarios, ensuring non-flagged messages are either retried or sent to the errors channel, while flagged messages are re-queued. This enhancement provides a safeguard against OOM errors in high-throughput or unstable environments while maintaining backward compatibility (unlimited buffer by default). Signed-off-by: Wenli Wan <[email protected]>
@hagen1778 @qiangmzsx @d1egoaz any progress or updates on this issue? I've made #3026 to limit the buffer size as an option |
This commit adds an optional configuration to Sarama's retry mechanism to limit the size of the retry buffer. The change addresses issues IBM#1358 and IBM#1372 by preventing unbounded memory growth when retries are backlogged or brokers are unresponsive. Key updates: - Added `Producer.Retry.MaxBufferLength` configuration to control the maximum number of messages stored in the retry buffer. - Implemented logic to handle overflow scenarios, ensuring non-flagged messages are either retried or sent to the errors channel, while flagged messages are re-queued. This enhancement provides a safeguard against OOM errors in high-throughput or unstable environments while maintaining backward compatibility (unlimited buffer by default). Signed-off-by: Wenli Wan <[email protected]>
This commit adds an optional configuration to Sarama's retry mechanism to limit the size of the retry buffer. The change addresses issues IBM#1358 and IBM#1372 by preventing unbounded memory growth when retries are backlogged or brokers are unresponsive. Key updates: - Added `Producer.Retry.MaxBufferLength` configuration to control the maximum number of messages stored in the retry buffer. - Implemented logic to handle overflow scenarios, ensuring non-flagged messages are either retried or sent to the errors channel, while flagged messages are re-queued. This enhancement provides a safeguard against OOM errors in high-throughput or unstable environments while maintaining backward compatibility (unlimited buffer by default). Signed-off-by: Wenli Wan <[email protected]>
This commit adds an optional configuration to Sarama's retry mechanism to limit the size of the retry buffer. The change addresses issues IBM#1358 and IBM#1372 by preventing unbounded memory growth when retries are backlogged or brokers are unresponsive. Key updates: - Added `Producer.Retry.MaxBufferLength` configuration to control the maximum number of messages stored in the retry buffer. - Implemented logic to handle overflow scenarios, ensuring non-flagged messages are either retried or sent to the errors channel, while flagged messages are re-queued. This enhancement provides a safeguard against OOM errors in high-throughput or unstable environments while maintaining backward compatibility (unlimited buffer by default). Signed-off-by: Wenli Wan <[email protected]>
This commit adds an optional configuration to Sarama's retry mechanism to limit the size of the retry buffer. The change addresses issues IBM#1358 and IBM#1372 by preventing unbounded memory growth when retries are backlogged or brokers are unresponsive. Key updates: - Added `Producer.Retry.MaxBufferLength` configuration to control the maximum number of messages stored in the retry buffer. - Implemented logic to handle overflow scenarios, ensuring non-flagged messages are either retried or sent to the errors channel, while flagged messages are re-queued. This enhancement provides a safeguard against OOM errors in high-throughput or unstable environments while maintaining backward compatibility (unlimited buffer by default). Signed-off-by: Wenli Wan <[email protected]>
This commit adds an optional configuration to Sarama's retry mechanism to limit the size of the retry buffer. The change addresses issues #1358 and #1372 by preventing unbounded memory growth when retries are backlogged or brokers are unresponsive. Key updates: - Added `Producer.Retry.MaxBufferLength` configuration to control the maximum number of messages stored in the retry buffer. - Implemented logic to handle overflow scenarios, ensuring non-flagged messages are either retried or sent to the errors channel, while flagged messages are re-queued. This enhancement provides a safeguard against OOM errors in high-throughput or unstable environments while maintaining backward compatibility (unlimited buffer by default). Signed-off-by: Wenli Wan <[email protected]>
Versions
Configuration
Logs
Sarama logs recorded right before memory usage goes up:
logs: CLICK ME
Problem Description
The go-application writes messages into
AsyncProducer
. It also reads fromSuccess
andError
channels. The average memory usage of application is about 8GB.The problem is that during peak load application memory consumption could raise up to 4x which causes OOM.
Log messages above were recorded during the incident. Messages with content
maximum request accumulated, waiting for space
were omitted. From the first glance it looks like there are some issues with inserting into Kafka and producer starts to retry messages.Digging into
AsyncProducer
showed that retryHandler uses buffer without limiting by max size. So I created an additional metric to store bufferLength
-// Length returns the number of elements currently stored in the queue.
And it showed perfect correlation between memory consumption and queue size:
This makes me think that
AsyncProducer
continues to accept incoming messages without retry queue growth control, which leads to uncontrolled memory consumption. I wasn't able to reproduce the issue at local environment.Expected behaviour
If
AsyncProducer
is unable to flush retry queue it should block on receiving new messages.The text was updated successfully, but these errors were encountered: