Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix logging on config watcher setup failure
We have experienced an odd behaviour where argo-events would attempt to start the eventsource pod, then exit just a few seconds later without producing any meaningful error message. The process exit code was set to 1. After some investigation and tracing the system calls, I found that the code responsible for this is related to usage of the `viper` config parsing library. Specifically, argo-events guards against bunch of possible errors while initially reading in the configuration, but does not have necessary error checking for setting up the file watchers. An example of such code can be found here: https://github.com/argoproj/argo-events/blob/78d47a2b6e948b9a3fa3572f0c95d8dcf5d7d8ff/eventbus/driver.go#L151 viper's `WatchConfig()` method will attempt to setup a watcher and if it's unsuccesful, will log the error message and exit the process with code 1. The trouble is that, by default the viper uses a discard logger so effectively log message is never actually produced. ref: https://github.com/spf13/viper/blob/8ac644165cf967d7d5be0cb149eb321c4c8ecfcf/viper.go#L446 An example of such execution in the Pod log files is not particularly easy to troubleshoot. Before the change: ``` $ ./argo-events-linux-arm64 eventsource-service {"level":"info","ts":1710844334.7253304,"logger":"argo-events.eventsource","caller":"cmd/start.go:63","msg":"starting eventsource server","eventSourceName":"nautobot-webhook","version":"latest+78d47a2.dirty"} {"level":"info","ts":1710844334.725548,"logger":"argo-events.eventsource","caller":"eventsources/eventing.go:454","msg":"Starting event source server...","eventSourceName":"nautobot-webhook"} $ $ echo $? 1 $ ``` After the change: ``` $ ./argo-events-linux-arm64 eventsource-service {"level":"info","ts":1710844214.6256192,"logger":"argo-events.eventsource","caller":"cmd/start.go:63","msg":"starting eventsource server","eventSourceName":"nautobot-webhook","version":"latest+78d47a2.dirty"} {"level":"info","ts":1710844214.6260495,"logger":"argo-events.eventsource","caller":"eventsources/eventing.go:454","msg":"Starting event source server...","eventSourceName":"nautobot-webhook"} ... {"time":"2024-03-19T10:30:14.626883973Z","level":"ERROR","msg":"failed to create watcher: too many open files"} $ ``` This bug can be easily reproduced, ideally in separate VM by artificially lowering the number of allowed `inotify` instances: ``` $ sudo sysctl fs.inotify.max_user_instances=0 $ ./argo-events-linux-arm64 eventsource-service ... ```
- Loading branch information