Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: kafka sensors stop working quietly #2958

Closed
MenD32 opened this issue Dec 29, 2023 · 4 comments
Closed

bug: kafka sensors stop working quietly #2958

MenD32 opened this issue Dec 29, 2023 · 4 comments
Labels
bug Something isn't working stale

Comments

@MenD32
Copy link
Contributor

MenD32 commented Dec 29, 2023

Describe the bug
Sensors connected to a kafka eventbus stop working some while after being deployed.

To Reproduce
Steps to reproduce the behavior:

  1. Create a kafka eventbus
  2. Create a basic HTTP webhook
  3. Create a kafka sensor (doesn't matter what it triggers)
  4. Wait a couple of weeks
  5. Sensor will stop receiving events

Expected behavior
The sensor should receive the events, or at least the pod

Environment (please complete the following information):

  • Kubernetes: v1.28
  • Argo Events: v1.8.0
  • Streamzi: v0.39.0

Screenshots
I unfortunelty cannot supply any screenshots or logs, since this happened in an offline environment, I will try to describe the logs.

  • eventsource logs indicated that the event was created
  • akhq into kafka showed that events were created in the kafka topic, but at some point in the middle of the topic they aren't being consumed.
  • logs on the sensor pod show the start of a kafka transaction but no the end of it.

Additional context

  • This does not happen instantly, and usually follows dozens of days that the sensor works fine
  • When investigating this, I have verified that the event is being created, and I could see the message being added to eventbus's topic
  • this happens seemingly at random, some sensors will fail after 2 weeks, some after a month, some are yet to fail.

Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

@MenD32 MenD32 added the bug Something isn't working label Dec 29, 2023
@MenD32
Copy link
Contributor Author

MenD32 commented Dec 29, 2023

After looking at the source code I suspect that this is caused by the producer's Errors() channel not being read, I created a pull requested that I believe that fixes this, #2959

@James-Derune
Copy link

We have also experienced this behavior. Sensors stop writing logs and no longer process events from Kafka, despite the events being created properly there (there is an offset lag)

We run using OpenShift and this happens both on version 4.10 and 4.12 of OCP (k8s versions are 1.23.5 and 1.25.14).

Copy link
Contributor

github-actions bot commented Mar 1, 2024

This issue has been automatically marked as stale because it has not had
any activity in the last 60 days. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Mar 1, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 9, 2024
@piby180
Copy link

piby180 commented Sep 5, 2024

I still experience this with the latest version (1.9.2). Sensors stop processing messages after a couple of days. They have to be restarted in order to make them work again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

3 participants