feat: EventSource and Sensor HA without extra RBAC #1163

whynowy · 2021-04-07T00:11:46Z

Signed-off-by: Derek Wang [email protected]

A new approach to do Active-Passive HA for Sensors and some of the EventSources, which does not require configuring extra RBAC. It utilizes raft leader election algorithm implemented over NATS.

See https://github.com/nats-io/graft.

Checklist:

My organization is added to USERS.md.

whynowy · 2021-04-07T00:13:33Z

controllers/eventsource/resource.go

@@ -332,26 +335,6 @@ func buildDeploymentSpec(args *AdaptorArgs) (*appv1.DeploymentSpec, error) {
 		spec.Template.Spec.PriorityClassName = args.EventSource.Spec.Template.PriorityClassName
 		spec.Template.Spec.Priority = args.EventSource.Spec.Template.Priority
 	}
-	allEventTypes := eventsources.GetEventingServers(args.EventSource, nil)


We don't need this any more, with leader election, all the event source deployments can run with rolling update strategy.

whynowy · 2021-04-07T00:14:26Z

controllers/sensor/resource.go

@@ -270,10 +275,6 @@ func buildDeploymentSpec(args *AdaptorArgs) (*appv1.DeploymentSpec, error) {
 			MatchLabels: args.Labels,
 		},
 		Replicas: &replicas,
-		Strategy: appv1.DeploymentStrategy{


With leader election, sensor deployments can also run with rolling update strategy.

whynowy · 2021-04-07T00:15:17Z

eventsources/eventing.go

@@ -284,54 +280,24 @@ func (e *EventSourceAdaptor) Start(ctx context.Context) error {
 		// EventSource object use the same type of deployment strategy
 		break
 	}
-	if !isRecreatType || e.eventSource.Spec.GetReplicas() == 1 {
+	if !isRecreatType {


Event sources like webhook should not run with leader election.

whynowy · 2021-04-07T00:16:47Z

test/util/util.go

+	return PodsLogContains(ctx, kubeClient, namespace, regex, podList, timeout), nil
+}
+
+func PodsLogContains(ctx context.Context, kubeClient kubernetes.Interface, namespace, regex string, podList *corev1.PodList, timeout time.Duration) bool {


Watch logs from multiple Pods.

Signed-off-by: Derek Wang <[email protected]>

alexec · 2021-04-07T23:03:37Z

I'm in two minds about this.

Leadership-election is a core cloud-native feature that is well understood by the Argo Team and community in general. We're replacing it with something that makes us more reliant on NATS. Given that a user needs to make changes to use the feature either way, would it be better to make using leadership elections straight forward?

whynowy · 2021-04-07T23:58:39Z

I'm in two minds about this.

Leadership-election is a core cloud-native feature that is well understood by the Argo Team and community in general. We're replacing it with something that makes us more reliant on NATS. Given that a user needs to make changes to use the feature either way, would it be better to make using leadership elections straight forward?

If we don't have NATS existing in the architecture, or the HA is for controllers, there's no doubt k8s leader election is the first choice. However the HA is for a dynamic service (EventSource or Sensor), and it so happened NATS is already there, I think we should utilize it, because that benefits much - it saves lots of extra configuration.

This approach requires the minimal spec change - only needs to specify spec.replicas. However using k8s leader elections, besides the replicas, user need to make series of RBAC changes:

Service Account
Role and RoleBinding
Specify spec.template.serviceAccountName

And these extra changes are required at least per namespace, if the user has a strong sense of security, he probably will specify resourceName in the Role definition (which limits the service account used by the EventSource/Sensor only can operate on the Lease object dedicated for it), this makes the RBAC settings even per EventSource (or Sensor).

I understand the concern of reliability of using leader election implemented over NATS, even I have done all kinds of testing upon it without seeing any issue, I still can not say it's reliable enough. How about this, we can defer using leader election when replicas = 1, and let's see the feedback from the community, and then make a decision if we use it by default (even when replicas=1)?

@alexec

alexec · 2021-04-08T00:51:53Z

I think we should go with NATS now. It's a big ask on our users to add a lot of extra RBAC, which people struggle with.

alexec · 2021-04-08T00:53:05Z

common/leaderelection/leaderelection.go

+		select {
+		case <-ctx.Done():
+			log.Info("exiting...")
+			cancel()


normally defer a cancel?

When current node is changed from leader to non-leader (line 128-133), cancel() need to be called to terminate the running service, and re-initiate a cctx and cancel. Not quite sure if defer cancel() still works in that case, let me do more testing.

Using defer works, updated.

alexec · 2021-04-08T00:54:59Z

controllers/sensor/resource.go

@@ -185,9 +185,14 @@ func buildDeployment(args *AdaptorArgs, eventBus *eventbusv1alpha1.EventBus) (*a
 					},
 				},
 			})
+			emptyDirVolName := "tmp-volume"


minor - maybe just call this tmp

Signed-off-by: Derek Wang <[email protected]>

* feat: EventSource and Sensor HA without extra RBAC Signed-off-by: Derek Wang <[email protected]>

whynowy commented Apr 7, 2021

View reviewed changes

feat: EventSource and Sensor HA without extra RBAC

4c327a9

Signed-off-by: Derek Wang <[email protected]>

whynowy force-pushed the natsleaderelect branch from f27f130 to 4c327a9 Compare April 7, 2021 00:30

whynowy requested review from alexec and VaibhavPage April 7, 2021 00:39

alexec approved these changes Apr 8, 2021

View reviewed changes

whynowy added 3 commits April 7, 2021 18:00

minor

a6bb9cc

Signed-off-by: Derek Wang <[email protected]>

Merge branch 'master' into natsleaderelect

9e39ab8

minor change

19d52ed

Signed-off-by: Derek Wang <[email protected]>

whynowy merged commit 5cd535b into argoproj:master Apr 8, 2021

whynowy deleted the natsleaderelect branch April 8, 2021 01:41

whynowy added a commit that referenced this pull request Apr 8, 2021

feat: EventSource and Sensor HA without extra RBAC (#1163)

0435122

* feat: EventSource and Sensor HA without extra RBAC Signed-off-by: Derek Wang <[email protected]>

0xgj mentioned this pull request Feb 28, 2022

Add kafka support for eventBus #1682

Closed

juliev0 pushed a commit to juliev0/argo-events that referenced this pull request Mar 29, 2022

feat: EventSource and Sensor HA without extra RBAC (argoproj#1163)

6d6a2a7

* feat: EventSource and Sensor HA without extra RBAC Signed-off-by: Derek Wang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: EventSource and Sensor HA without extra RBAC #1163

feat: EventSource and Sensor HA without extra RBAC #1163

whynowy commented Apr 7, 2021 •

edited

Loading

whynowy Apr 7, 2021

whynowy Apr 7, 2021

whynowy Apr 7, 2021

whynowy Apr 7, 2021

alexec commented Apr 7, 2021

whynowy commented Apr 7, 2021 •

edited

Loading

alexec commented Apr 8, 2021

alexec Apr 8, 2021

whynowy Apr 8, 2021

whynowy Apr 8, 2021

alexec Apr 8, 2021

whynowy Apr 8, 2021

feat: EventSource and Sensor HA without extra RBAC #1163

feat: EventSource and Sensor HA without extra RBAC #1163

Conversation

whynowy commented Apr 7, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexec commented Apr 7, 2021

whynowy commented Apr 7, 2021 • edited Loading

alexec commented Apr 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whynowy commented Apr 7, 2021 •

edited

Loading

whynowy commented Apr 7, 2021 •

edited

Loading