Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: EventSource and Sensor HA without extra RBAC #1163

Merged
merged 4 commits into from
Apr 8, 2021

Conversation

whynowy
Copy link
Member

@whynowy whynowy commented Apr 7, 2021

Signed-off-by: Derek Wang [email protected]

A new approach to do Active-Passive HA for Sensors and some of the EventSources, which does not require configuring extra RBAC. It utilizes raft leader election algorithm implemented over NATS.

See https://github.com/nats-io/graft.

Checklist:

@@ -332,26 +335,6 @@ func buildDeploymentSpec(args *AdaptorArgs) (*appv1.DeploymentSpec, error) {
spec.Template.Spec.PriorityClassName = args.EventSource.Spec.Template.PriorityClassName
spec.Template.Spec.Priority = args.EventSource.Spec.Template.Priority
}
allEventTypes := eventsources.GetEventingServers(args.EventSource, nil)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this any more, with leader election, all the event source deployments can run with rolling update strategy.

@@ -270,10 +275,6 @@ func buildDeploymentSpec(args *AdaptorArgs) (*appv1.DeploymentSpec, error) {
MatchLabels: args.Labels,
},
Replicas: &replicas,
Strategy: appv1.DeploymentStrategy{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With leader election, sensor deployments can also run with rolling update strategy.

@@ -284,54 +280,24 @@ func (e *EventSourceAdaptor) Start(ctx context.Context) error {
// EventSource object use the same type of deployment strategy
break
}
if !isRecreatType || e.eventSource.Spec.GetReplicas() == 1 {
if !isRecreatType {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Event sources like webhook should not run with leader election.

return PodsLogContains(ctx, kubeClient, namespace, regex, podList, timeout), nil
}

func PodsLogContains(ctx context.Context, kubeClient kubernetes.Interface, namespace, regex string, podList *corev1.PodList, timeout time.Duration) bool {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Watch logs from multiple Pods.

@whynowy whynowy requested review from alexec and VaibhavPage April 7, 2021 00:39
@alexec
Copy link
Contributor

alexec commented Apr 7, 2021

I'm in two minds about this.

Leadership-election is a core cloud-native feature that is well understood by the Argo Team and community in general. We're replacing it with something that makes us more reliant on NATS. Given that a user needs to make changes to use the feature either way, would it be better to make using leadership elections straight forward?

@whynowy
Copy link
Member Author

whynowy commented Apr 7, 2021

I'm in two minds about this.

Leadership-election is a core cloud-native feature that is well understood by the Argo Team and community in general. We're replacing it with something that makes us more reliant on NATS. Given that a user needs to make changes to use the feature either way, would it be better to make using leadership elections straight forward?

I'm in two minds about this.

Leadership-election is a core cloud-native feature that is well understood by the Argo Team and community in general. We're replacing it with something that makes us more reliant on NATS. Given that a user needs to make changes to use the feature either way, would it be better to make using leadership elections straight forward?

If we don't have NATS existing in the architecture, or the HA is for controllers, there's no doubt k8s leader election is the first choice. However the HA is for a dynamic service (EventSource or Sensor), and it so happened NATS is already there, I think we should utilize it, because that benefits much - it saves lots of extra configuration.

This approach requires the minimal spec change - only needs to specify spec.replicas. However using k8s leader elections, besides the replicas, user need to make series of RBAC changes:

  1. Service Account
  2. Role and RoleBinding
  3. Specify spec.template.serviceAccountName

And these extra changes are required at least per namespace, if the user has a strong sense of security, he probably will specify resourceName in the Role definition (which limits the service account used by the EventSource/Sensor only can operate on the Lease object dedicated for it), this makes the RBAC settings even per EventSource (or Sensor).

I understand the concern of reliability of using leader election implemented over NATS, even I have done all kinds of testing upon it without seeing any issue, I still can not say it's reliable enough. How about this, we can defer using leader election when replicas = 1, and let's see the feedback from the community, and then make a decision if we use it by default (even when replicas=1)?

@alexec

@alexec
Copy link
Contributor

alexec commented Apr 8, 2021

I think we should go with NATS now. It's a big ask on our users to add a lot of extra RBAC, which people struggle with.

select {
case <-ctx.Done():
log.Info("exiting...")
cancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normally defer a cancel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When current node is changed from leader to non-leader (line 128-133), cancel() need to be called to terminate the running service, and re-initiate a cctx and cancel. Not quite sure if defer cancel() still works in that case, let me do more testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using defer works, updated.

@@ -185,9 +185,14 @@ func buildDeployment(args *AdaptorArgs, eventBus *eventbusv1alpha1.EventBus) (*a
},
},
})
emptyDirVolName := "tmp-volume"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor - maybe just call this tmp

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@whynowy whynowy merged commit 5cd535b into argoproj:master Apr 8, 2021
@whynowy whynowy deleted the natsleaderelect branch April 8, 2021 01:41
whynowy added a commit that referenced this pull request Apr 8, 2021
* feat: EventSource and Sensor HA without extra RBAC

Signed-off-by: Derek Wang <[email protected]>
juliev0 pushed a commit to juliev0/argo-events that referenced this pull request Mar 29, 2022
* feat: EventSource and Sensor HA without extra RBAC

Signed-off-by: Derek Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants