-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix logging on config watcher setup failure #3074
Conversation
5735055
to
e443d29
Compare
Just realised that |
27c4056
to
0c08e74
Compare
0c08e74
to
128cb18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you fix the conflict?
"golang.org/x/exp/slog" | ||
) | ||
|
||
func ViperWithLogging() *viper.Viper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks!
128cb18
to
eeed829
Compare
We have experienced an odd behaviour where argo-events would attempt to start the eventsource pod, then exit just a few seconds later without producing any meaningful error message. The process exit code was set to 1. After some investigation and tracing the system calls, I found that the code responsible for this is related to usage of the `viper` config parsing library. Specifically, argo-events guards against bunch of possible errors while initially reading in the configuration, but does not have necessary error checking for setting up the file watchers. An example of such code can be found here: https://github.com/argoproj/argo-events/blob/78d47a2b6e948b9a3fa3572f0c95d8dcf5d7d8ff/eventbus/driver.go#L151 viper's `WatchConfig()` method will attempt to setup a watcher and if it's unsuccesful, will log the error message and exit the process with code 1. The trouble is that, by default the viper uses a discard logger so effectively log message is never actually produced. ref: https://github.com/spf13/viper/blob/8ac644165cf967d7d5be0cb149eb321c4c8ecfcf/viper.go#L446 An example of such execution in the Pod log files is not particularly easy to troubleshoot. Before the change: ``` $ ./argo-events-linux-arm64 eventsource-service {"level":"info","ts":1710844334.7253304,"logger":"argo-events.eventsource","caller":"cmd/start.go:63","msg":"starting eventsource server","eventSourceName":"nautobot-webhook","version":"latest+78d47a2.dirty"} {"level":"info","ts":1710844334.725548,"logger":"argo-events.eventsource","caller":"eventsources/eventing.go:454","msg":"Starting event source server...","eventSourceName":"nautobot-webhook"} $ $ echo $? 1 $ ``` After the change: ``` $ ./argo-events-linux-arm64 eventsource-service {"level":"info","ts":1710844214.6256192,"logger":"argo-events.eventsource","caller":"cmd/start.go:63","msg":"starting eventsource server","eventSourceName":"nautobot-webhook","version":"latest+78d47a2.dirty"} {"level":"info","ts":1710844214.6260495,"logger":"argo-events.eventsource","caller":"eventsources/eventing.go:454","msg":"Starting event source server...","eventSourceName":"nautobot-webhook"} ... {"time":"2024-03-19T10:30:14.626883973Z","level":"ERROR","msg":"failed to create watcher: too many open files"} $ ``` This bug can be easily reproduced, ideally in separate VM by artificially lowering the number of allowed `inotify` instances: ``` $ sudo sysctl fs.inotify.max_user_instances=0 $ ./argo-events-linux-arm64 eventsource-service ... ``` Signed-off-by: Marek Skrobacki <[email protected]> Signed-off-by: Marek Skrobacki <[email protected]>
eeed829
to
d8d25c3
Compare
All done, thank you! |
Signed-off-by: Marek Skrobacki <[email protected]> Signed-off-by: Marek Skrobacki <[email protected]>
We have experienced an odd behaviour where argo-events would attempt to start the eventsource pod, then exit just a few seconds later without producing any meaningful error message. The process exit code was set to 1.
After some investigation and tracing the system calls, I found that the code responsible for this is related to usage of the
viper
config parsing library. Specifically, argo-events guards against bunch of possible errors while initially reading in the configuration, but does not have necessary error checking for setting up the file watchers. An example of such code can be found here:argo-events/eventbus/driver.go
Line 151 in 78d47a2
viper's
WatchConfig()
method will attempt to setup a watcher and if it's unsuccesful, will log the error message and exit the process with code 1. The trouble is that, by default the viper uses a discard logger so effectively log message is never actually produced.ref: https://github.com/spf13/viper/blob/8ac644165cf967d7d5be0cb149eb321c4c8ecfcf/viper.go#L446
An example of such execution in the Pod log files is not particularly easy to troubleshoot.
Before the change:
After the change:
This bug can be easily reproduced, ideally in separate VM by artificially lowering the number of allowed
inotify
instances: