Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

producer panic when retrying send the message #2322

Closed
wrfly opened this issue Aug 24, 2022 · 6 comments
Closed

producer panic when retrying send the message #2322

wrfly opened this issue Aug 24, 2022 · 6 comments

Comments

@wrfly
Copy link

wrfly commented Aug 24, 2022

Versions

Please specify real version numbers or git SHAs, not just "Latest" since that changes fairly regularly.

Sarama Kafka Go
v1.35.0
Configuration

What configuration values are you using for Sarama and Kafka?

Producer.Retry.Max = 3 (default)
Logs

When filing an issue please provide logs from Sarama and Kafka if at all
possible. You can set sarama.Logger to a log.Logger to capture Sarama debug
output.

logs: CLICK ME

2022/08/24 11:49:01 kafka: Failed to produce message to topic xxxx_kwad_ph_log_live: circuit breaker is open
2022/08/24 11:49:01 kafka: Failed to produce message to topic xxxx_kwad_ph_log_live: circuit breaker is open
2022/08/24 11:49:01 kafka: Failed to produce message to topic xxxx_kwad_ph_log_live: circuit breaker is open
2022/08/24 11:49:01 kafka: Failed to produce message to topic xxxx_kwad_ph_log_live: circuit breaker is open
2022/08/24 11:49:01 kafka: Failed to produce message to topic xxxx_kwad_log_live: kafka server: In the middle of a leadership election, there is currently no leader for this partition and hence it is unavailable for writes
2022/08/24 11:49:01 kafka: Failed to produce message to topic xxxx_kwad_log_live: kafka server: In the middle of a leadership election, there is currently no leader for this partition and hence it is unavailable for writes
2022/08/24 11:49:01 kafka: Failed to produce message to topic xxxx_kwad_log_live: kafka server: In the middle of a leadership election, there is currently no leader for this partition and hence it is unavailable for writes
2022/08/24 11:49:01 kafka: Failed to produce message to topic xxxx_kwad_log_live: kafka server: In the middle of a leadership election, there is currently no leader for this partition and hence it is unavailable for writes
2022/08/24 11:49:01 kafka: Failed to produce message to topic xxxx_kwad_log_live: kafka server: In the middle of a leadership election, there is currently no leader for this partition and hence it is unavailable for writes
2022/08/24 11:49:01 kafka: Failed to produce message to topic xxxx_kwad_log_live: kafka server: In the middle of a leadership election, there is currently no leader for this partition and hence it is unavailable for writes
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x936fbd]

goroutine 127982 [running]:
github.com/Shopify/sarama.(*partitionProducer).newHighWatermark(0xc93a0bc480, 0x1)
    /root/go/pkg/mod/github.com/!shopify/[email protected]/async_producer.go:620 +0x19d
github.com/Shopify/sarama.(*partitionProducer).dispatch(0xc93a0bc480)
    /root/go/pkg/mod/github.com/!shopify/[email protected]/async_producer.go:564 +0x50a
github.com/Shopify/sarama.withRecover(0xc93a0ac510?)
    /root/go/pkg/mod/github.com/!shopify/[email protected]/utils.go:43 +0x3e
created by github.com/Shopify/sarama.(*asyncProducer).newPartitionProducer
    /root/go/pkg/mod/github.com/!shopify/[email protected]/async_producer.go:515 +0x1f6
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x936fbd]

goroutine 216212 [running]:
github.com/Shopify/sarama.(*partitionProducer).newHighWatermark(0xca56562ea0, 0x2)
    /root/go/pkg/mod/github.com/!shopify/[email protected]/async_producer.go:620 +0x19d
github.com/Shopify/sarama.(*partitionProducer).dispatch(0xca56562ea0)
    /root/go/pkg/mod/github.com/!shopify/[email protected]/async_producer.go:564 +0x50a
github.com/Shopify/sarama.withRecover(0xca55e1c6c0?)
    /root/go/pkg/mod/github.com/!shopify/[email protected]/utils.go:43 +0x3e
created by github.com/Shopify/sarama.(*asyncProducer).newPartitionProducer
    /root/go/pkg/mod/github.com/!shopify/[email protected]/async_producer.go:515 +0x1f6
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x936fbd]

goroutine 168736 [running]:
github.com/Shopify/sarama.(*partitionProducer).newHighWatermark(0xc9ce48f8c0, 0x1)
    /root/go/pkg/mod/github.com/!shopify/[email protected]/async_producer.go:620 +0x19d
github.com/Shopify/sarama.(*partitionProducer).dispatch(0xc9ce48f8c0)
    /root/go/pkg/mod/github.com/!shopify/[email protected]/async_producer.go:564 +0x50a
github.com/Shopify/sarama.withRecover(0xc9c7e7b1d0?)
    /root/go/pkg/mod/github.com/!shopify/[email protected]/utils.go:43 +0x3e
created by github.com/Shopify/sarama.(*asyncProducer).newPartitionProducer

Problem Description

producer panic when Kafka cluster has no partition leader

goroutine 168736 [running]:
github.com/Shopify/sarama.(*partitionProducer).newHighWatermark(0xc9ce48f8c0, 0x1)
    /root/go/pkg/mod/github.com/!shopify/[email protected]/async_producer.go:620 +0x19d
github.com/Shopify/sarama.(*partitionProducer).dispatch(0xc9ce48f8c0)
    /root/go/pkg/mod/github.com/!shopify/[email protected]/async_producer.go:564 +0x50a
github.com/Shopify/sarama.withRecover(0xc9c7e7b1d0?)
    /root/go/pkg/mod/github.com/!shopify/[email protected]/utils.go:43 +0x3e
created by github.com/Shopify/sarama.(*asyncProducer).newPartitionProducer

related code:

https://github.com/Shopify/sarama/blob/main/async_producer.go#L564

https://github.com/Shopify/sarama/blob/3083a9b96a628fcb0882de334507af5e520ca1cb/async_producer.go#L620

need to check the brokerProducer before send message (no matter normal send or retry)

https://github.com/Shopify/sarama/blob/3083a9b96a628fcb0882de334507af5e520ca1cb/async_producer.go#L612-L626

@BarbarossaTM
Copy link

Just hit the same issue. It feels like a race condition from what we are seeing, but we didn't look very deeply into it. Did anyone already debug this? :-)

@3AceShowHand
Copy link

The same problem also happens to v1.36.0

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x11220c9]

goroutine 217306 [running]:
github.com/Shopify/sarama.(*partitionProducer).newHighWatermark(0xc410f81d40, 0x1)
	github.com/Shopify/[email protected]/async_producer.go:620 +0x1a9
github.com/Shopify/sarama.(*partitionProducer).dispatch(0xc410f81d40)
	github.com/Shopify/[email protected]/async_producer.go:564 +0x537
github.com/Shopify/sarama.withRecover(0xc27e8bef98?)
	github.com/Shopify/[email protected]/utils.go:43 +0x3e
created by github.com/Shopify/sarama.(*asyncProducer).newPartitionProducer
	github.com/Shopify/[email protected]/async_producer.go:515 +0x1ea

hsweif added a commit to hsweif/sarama that referenced this issue Nov 11, 2022
Check nil and update the leader, if needed, before updating the new
watermark

Refs: IBM#2322
@edoger
Copy link

edoger commented Nov 14, 2022

This problem still exists in the latest version that has been released. The more serious consequence is that the producer will enter a state of stopping work and no longer report any errors. The application usually enters a blocking production state and cannot perceive that the producer has failed.

@edoger
Copy link

edoger commented Nov 14, 2022

@dnwe Do you have better suggestions or patches?

dnwe pushed a commit that referenced this issue Dec 21, 2022
Check nil and update the leader, if needed, before updating the new
watermark

Refs: #2322
jayshrivastava added a commit to jayshrivastava/cockroach that referenced this issue Feb 23, 2023
A previous update (cockroachdb#95544) which updated sarama
to 1.35.0 introduced a bug which resulted in nodes crashing. These failures are shown by
cockroachdb#96419. The bug in described in detail in
IBM/sarama#2322 and fixed by IBM/sarama@2379257,
which is included in version 1.38.1.

Fixes: cockroachdb#96419
Release note: None
Epic: None
craig bot pushed a commit to cockroachdb/cockroach that referenced this issue Mar 1, 2023
97571: cdc: update sarama from 1.35.0 to 1.38.1 r=miretskiy a=jayshrivastava

A previous update (#95544) which updated sarama to 1.35.0 introduced a bug which resulted in nodes crashing. These failures are shown by #96419. The bug in described in detail in IBM/sarama#2322 and fixed by IBM/sarama@2379257, which is included in version 1.38.1.

Fixes: #96419
Release note: None
Epic: None


Co-authored-by: Jayant Shrivastava <[email protected]>
@github-actions

This comment was marked as outdated.

@github-actions github-actions bot added the stale Issues and pull requests without any recent activity label Aug 17, 2023
@dnwe
Copy link
Collaborator

dnwe commented Aug 17, 2023

Fixed in v1.38.0

@dnwe dnwe closed this as completed Aug 17, 2023
@dnwe dnwe removed the stale Issues and pull requests without any recent activity label Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants