-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implementing signals as microservice deployments #49
Conversation
closes #17 |
Is it fair to say that; after this change:
|
@shrinandj just to clarify your understanding:
|
@magaldima how are we handling signal pods errors/failures here? signal pod would notify errors to controller if its running, what would happen in case where signal pod is in crash loop? would controller detect connection disconnect and make decision based on that? |
@spk83 When a signal pod dies, all the bi-directional streams become disconnected from the sensor controller. When this happens, the controller's goroutines listening on the streams receive an error notifying them that the context has failed or the connection was closed. The signal nodes are then updated with an error. During the next processing loop of the affected sensor, the controller will attempt to re-establish a connection stream with the signal pod (up to the number of retries (3) for a sensor in an error phase). Therefore, if the signal pod is in a crash loop, the controller would try 3 times to make a connection, and after failing would not re-queue the sensor. The sensor resource and the signal node would remain in an error phase. Assuming, the signal pod requires user intervention to fix the issue and it becomes stable, the user would then have to re-queue or re-create the sensors so that the controller can operate on them. |
…lient to pod failures and moving escalation logic after requeue failures
a6a46d6
to
b2b8b80
Compare
After this commit, do each of the signal docker images need to be deployed in the cluster individually? |
* implementing signals as microservice deployments, making signals resilient to pod failures and moving escalation logic after requeue failures * updating Makefile and CONTRIBUTING guide * fixing failing tests
This PR introduces a major shift in the architecture behind the
argo-events
signals. I know this PR has many changes, but I'll do my best to explain what is changing and why. I'll start with why.Why?
micro
signal services without changing theargo-events
code base. Users can deploy their own combination of signal support depending on their needs.What is this?
To get a better idea of what this change introduces, it would be helpful to understand the history of the
argo-events
signal functionality.Signals started out (as of the
v0.5-alpha1
release) as separate goroutines run inside of individual "executor" pods. For each active sensor, there was a running pod.A couple weeks ago, I merged in a change to make certain signals (artifact, calendar, resource, webhook) run as separate Goroutines within the sensor-controller pod while the stream signals (nats, kafka, mqtt, amqp) run as separate processing "plugged-in" to the controller binary via go-plugin. All of the signals implemented the same interface, so calling every signal happened the same way.
Now, this PR introduces separate signal deployments which register themselves as micro services to any other
micro
clients. These services were enhanced in order to become stateless. In our case, the sensor-controller becomes the solemicro
client and listens to signals viagRPC
Listen()
.