Skip to content

Commit

Permalink
fix logging on config watcher setup failure
Browse files Browse the repository at this point in the history
We have experienced an odd behaviour where argo-events would attempt to start
the eventsource pod, then exit just a few seconds later without producing any
meaningful error message. The process exit code was set to 1.

After some investigation and tracing the system calls, I found that the code
responsible for this is related to usage of the `viper` config parsing library.
Specifically, argo-events guards against bunch of possible errors while
initially reading in the configuration, but does not have necessary error
checking for setting up the file watchers. An example of such code can be found
here:

https://github.com/argoproj/argo-events/blob/78d47a2b6e948b9a3fa3572f0c95d8dcf5d7d8ff/eventbus/driver.go#L151

viper's `WatchConfig()` method will attempt to setup a watcher and if it's
unsuccesful, will log the error message and exit the process with code 1. The
trouble is that, by default the viper uses a discard logger so effectively log
message is never actually produced.

ref: https://github.com/spf13/viper/blob/8ac644165cf967d7d5be0cb149eb321c4c8ecfcf/viper.go#L446

An example of such execution in the Pod log files is not particularly easy to
troubleshoot.

Before the change:
```
$ ./argo-events-linux-arm64 eventsource-service
{"level":"info","ts":1710844334.7253304,"logger":"argo-events.eventsource","caller":"cmd/start.go:63","msg":"starting eventsource server","eventSourceName":"nautobot-webhook","version":"latest+78d47a2.dirty"}
{"level":"info","ts":1710844334.725548,"logger":"argo-events.eventsource","caller":"eventsources/eventing.go:454","msg":"Starting event source server...","eventSourceName":"nautobot-webhook"}
$
$ echo $?
1
$
```

After the change:

```
$ ./argo-events-linux-arm64 eventsource-service
{"level":"info","ts":1710844214.6256192,"logger":"argo-events.eventsource","caller":"cmd/start.go:63","msg":"starting eventsource server","eventSourceName":"nautobot-webhook","version":"latest+78d47a2.dirty"}
{"level":"info","ts":1710844214.6260495,"logger":"argo-events.eventsource","caller":"eventsources/eventing.go:454","msg":"Starting event source server...","eventSourceName":"nautobot-webhook"}
...
{"time":"2024-03-19T10:30:14.626883973Z","level":"ERROR","msg":"failed to create watcher: too many open files"}
$
```

This bug can be easily reproduced, ideally in separate VM by artificially
lowering the number of allowed `inotify` instances:

```
$ sudo sysctl fs.inotify.max_user_instances=0
$ ./argo-events-linux-arm64 eventsource-service
...
```
  • Loading branch information
skrobul committed Mar 19, 2024
1 parent 78d47a2 commit e443d29
Show file tree
Hide file tree
Showing 4 changed files with 32 additions and 6 deletions.
3 changes: 1 addition & 2 deletions common/leaderelection/leaderelection.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ import (
"github.com/fsnotify/fsnotify"
"github.com/nats-io/graft"
nats "github.com/nats-io/nats.go"
"github.com/spf13/viper"
"go.uber.org/zap"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/kubernetes"
Expand Down Expand Up @@ -79,7 +78,7 @@ func getEventBusAuth(ctx context.Context, authStrategy *eventbusv1alpha1.AuthStr
Strategy: eventbusv1alpha1.AuthStrategyNone,
}
} else {
v := viper.New()
v := common.ViperWithLogging()
v.SetConfigName("auth")
v.SetConfigType("yaml")
v.AddConfigPath(eventBusAuthFileMountPath)
Expand Down
28 changes: 28 additions & 0 deletions common/viper.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
/*
Copyright 2024 BlackRock, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package common

import (
"github.com/spf13/viper"
"log/slog"

Check failure on line 21 in common/viper.go

View workflow job for this annotation

GitHub Actions / E2E Tests (stan)

package log/slog is not in GOROOT (/opt/hostedtoolcache/go/1.20.14/x64/src/log/slog)

Check failure on line 21 in common/viper.go

View workflow job for this annotation

GitHub Actions / E2E Tests (jetstream)

package log/slog is not in GOROOT (/opt/hostedtoolcache/go/1.20.14/x64/src/log/slog)

Check failure on line 21 in common/viper.go

View workflow job for this annotation

GitHub Actions / Unit Tests

package log/slog is not in GOROOT (/opt/hostedtoolcache/go/1.20.14/x64/src/log/slog)

Check failure on line 21 in common/viper.go

View workflow job for this annotation

GitHub Actions / E2E Tests (kafka)

package log/slog is not in GOROOT (/opt/hostedtoolcache/go/1.20.14/x64/src/log/slog)
"os"
)

func ViperWithLogging() *viper.Viper {
v := viper.NewWithOptions(viper.WithLogger(slog.New(slog.NewJSONHandler(os.Stdout, nil))))
return v
}
4 changes: 2 additions & 2 deletions controllers/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ import (
"fmt"
"strings"

"github.com/argoproj/argo-events/common"
"github.com/fsnotify/fsnotify"
"github.com/spf13/viper"
)

type GlobalConfig struct {
Expand Down Expand Up @@ -94,7 +94,7 @@ func (g *GlobalConfig) GetJetStreamVersion(version string) (*JetStreamVersion, e
}

func LoadConfig(onErrorReloading func(error)) (*GlobalConfig, error) {
v := viper.New()
v := common.ViperWithLogging()
v.SetConfigName("controller-config")
v.SetConfigType("yaml")
v.AddConfigPath("/etc/argo-events")
Expand Down
3 changes: 1 addition & 2 deletions eventbus/driver.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ import (
stansensor "github.com/argoproj/argo-events/eventbus/stan/sensor"
eventbusv1alpha1 "github.com/argoproj/argo-events/pkg/apis/eventbus/v1alpha1"
"github.com/argoproj/argo-events/pkg/apis/sensor/v1alpha1"
"github.com/spf13/viper"
)

func GetEventSourceDriver(ctx context.Context, eventBusConfig eventbusv1alpha1.BusConfig, eventSourceName string, defaultSubject string) (eventbuscommon.EventSourceDriver, error) {
Expand Down Expand Up @@ -134,7 +133,7 @@ func GetAuth(ctx context.Context, eventBusConfig eventbusv1alpha1.BusConfig) (*e
Strategy: eventbusv1alpha1.AuthStrategyNone,
}
} else {
v := viper.New()
v := common.ViperWithLogging()
v.SetConfigName("auth")
v.SetConfigType("yaml")
v.AddConfigPath(common.EventBusAuthFileMountPath)
Expand Down

0 comments on commit e443d29

Please sign in to comment.