-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffer: Fix NoMethodError with empty unstaged chunk arrays #4303
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution.
When the write function encounters a chunk that is defined as a Nil object, the call to synchronize fails.
Is it reproducible?
Could you tell us how to reproduce it?
The error will occur if unstaged_chunks[m]
is an empty list or contains a nil
object.
However, it does not seem possible.
fluentd/lib/fluent/plugin/buffer.rb
Lines 356 to 359 in 78c9146
elsif chunk.unstaged? | |
unstaged_chunks[metadata] ||= [] | |
unstaged_chunks[metadata] << chunk | |
end |
Tell that to my constant 100MB/sec data flow that throws this error frequently. :) I'll see if I can forcefully reproduce it outside that environment. Here's the stack trace, for clarification:
It's not the entire chunk array unstaged_chunks[m] that is nil, just one of the values being popped off the end of the array itself. The conditional on line 420 succeeds, so there is value in unstaged_chunks[m] overall, but the value being parsed is nil. |
The issue is that unstaged_chunks[m] is an empty array. Once we've popped all the values off the array, it'll still evaluate truthy while empty (thanks Ruby), so we could end up attempting to pop another value off the end of an empty array, resulting in a nil return to "u". A cleaner fix might be to adjust the conditional on line 420 to also include a check for empty. Ruby doesn't evaluate an empty array to false, so the existing conditional can/does allow empty values to be parsed. I updated the PR with this adjustment, as it's cleaner than the additional nil check instead. |
Signed-off-by: Alex Doolittle <[email protected]>
Signed-off-by: Alex Doolittle <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if I can forcefully reproduce it outside that environment.
Thanks!
Once we've popped all the values off the array, it'll still evaluate truthy while empty (thanks Ruby), so we could end up attempting to pop another value off the end of an empty array, resulting in a nil return to "u".
I see!
Certainly, it seems possible to execute pop
against an empty list because this process can be repeated.
fluentd/lib/fluent/plugin/buffer.rb
Lines 416 to 421 in 78c9146
chunks_to_enqueue.each do |c| | |
if c.staged? && (enqueue || chunk_size_full?(c)) | |
m = c.metadata | |
enqueue_chunk(m) | |
if unstaged_chunks[m] | |
u = unstaged_chunks[m].pop |
If so, this fix would be reasonable.
I want to confirm the reproduction conditions a little more.
Sorry for the delay. |
I don't yet fully understand the role of So, I merge this now for the next release. |
Signed-off-by: Alex Doolittle <[email protected]>
Seeing #4342, I understand now some mechanics of this bug. I believe this duplication of Anyway, these unstaged chunks will be enqueued immediately in the |
When the write function encounters a chunk that is defined as a Nil object, the call to synchronize fails. This failure causes an unexpected error that is caught and handled like a failure to connect, with a 30 second retry delay.
In an environment with a large amount of data, and this Nil scenario occurring frequently, it introduces significant processing delays since all processing activity stops during the 30 second window. The topic being queried continues to fill but no agents parse the data, leading to an ever-increasing backlog.