-
Notifications
You must be signed in to change notification settings - Fork 746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Argo Sensors pods die when the EventBus leader pod is killed #2376
Comments
What is the version of Argo Events (sorry the helm chart is maintained by community users)? |
The latest I've tried is 1.7.4 |
Did you use |
Yes we do
Yes it does. |
This issue has been automatically marked as stale because it has not had |
Looks like the bug is in this line: https://github.com/argoproj/argo-events/blob/master/sensors/listener.go#L301 if conn == nil || conn.IsClosed() { here conn is of type
Normally methods are nil safe but the TriggerConnections are composing a pointer to a connection struct which is actually implimenting methods IsClosed and Close of the interface. The connection struct has nil safe implimentation of the interface but the composed struct TriggerConnections isn't nil safe. Eg: Jetstream Trigger Connection //IsClosed and Close are nil safe for JetstreamConnection
type JetstreamConnection struct {
NATSConn *nats.Conn
JSContext nats.JetStreamContext
NATSConnected bool
Logger *zap.SugaredLogger
}
func (jsc *JetstreamConnection) Close() error {
if jsc == nil {
return fmt.Errorf("can't close Jetstream connection, JetstreamConnection is nil")
}
if jsc.NATSConn != nil && jsc.NATSConn.IsConnected() {
jsc.NATSConn.Close()
}
return nil
}
func (jsc *JetstreamConnection) IsClosed() bool {
return jsc == nil || jsc.NATSConn == nil || !jsc.NATSConnected || jsc.NATSConn.IsClosed()
} //JetstreamTriggerConn will panic on IsClosed() and Close()
type JetstreamTriggerConn struct {
*jetstreambase.JetstreamConnection
sensorName string
triggerName string
keyValueStore nats.KeyValue
dependencyExpression string
requiresANDLogic bool
evaluableExpression *govaluate.EvaluableExpression
deps []eventbuscommon.Dependency
sourceDepMap map[string][]string // maps EventSource and EventName to dependency name
recentMsgsByID map[string]*msg // prevent re-processing the same message as before (map of msg ID to time)
recentMsgsByTime []*msg
} To fix this issue, I think there are three options.
I think the nil safe option is the better choice. func (conn *JetstreamTriggerConn) IsClosed() bool {
return conn == nil || conn.JetstreamConnection.IsClosed()
}
func (conn *JetstreamTriggerConn) Close() error {
if conn == nil {
return fmt.Errorf("can't close Jetstream trigger connection, JetstreamTriggerConn is nil")
}
return conn.JetstreamConnection.Close()
} |
We encounter some instability with Argo Events.
Versions and setup
When the EventBus leader is killed, our sensor dies with errors like:
2022-12-22T15:30:37.963Z INFO argo-events.sensor base/jetstream.go:102 Connected to NATS Jetstream server. {"sensorName": "nats-sensor-infra-present"} 2022-12-22T15:30:42.940Z ERROR argo-events.sensor sensor/trigger_conn.go:78 failed to get K/V store for sensor nats-sensor-infra-present: context deadline exceeded {"sensorName": "nats-sensor-infra-present", "triggerName": "app-trigger"} github.com/argoproj/argo-events/eventbus/jetstream/sensor.NewJetstreamTriggerConn /home/runner/work/argo-events/argo-events/eventbus/jetstream/sensor/trigger_conn.go:78 github.com/argoproj/argo-events/eventbus/jetstream/sensor.(*SensorJetstream).Connect /home/runner/work/argo-events/argo-events/eventbus/jetstream/sensor/sensor_jetstream.go:77 github.com/argoproj/argo-events/sensors.(*SensorContext).listenEvents.func2 /home/runner/work/argo-events/argo-events/sensors/listener.go:292 2022-12-22T15:30:42.940Z ERROR argo-events.sensor sensors/listener.go:294 failed to reconnect to eventbus {"sensorName": "nats-sensor-infra-present", "triggerName": "app-trigger", "connection": "", "error": "failed to get K/V store for sensor nats-sensor-infra-present: context deadline exceeded"} github.com/argoproj/argo-events/sensors.(*SensorContext).listenEvents.func2 /home/runner/work/argo-events/argo-events/sensors/listener.go:294 panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1a94360]
I saw the somewhat similar issue
#2086
and the related change, but I'm not certain it will fix our issue.
The text was updated successfully, but these errors were encountered: