Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pyzeebe will never time out a connection to a gateway with problems #178

Closed
kbakk opened this issue Jun 25, 2021 · 6 comments · Fixed by #180 or #181
Closed

Pyzeebe will never time out a connection to a gateway with problems #178

kbakk opened this issue Jun 25, 2021 · 6 comments · Fixed by #180 or #181
Labels
bug Something isn't working

Comments

@kbakk
Copy link
Collaborator

kbakk commented Jun 25, 2021

Describe the bug
If the gateway fails in a certain way, while a connection has been opened to it, Pyzeebe will not time out.

The issue applies both to the client, and the worker.

For the worker, the issue is noticed by it hanging, if it experience loss of connectivity to gateway.

To Reproduce

Start the gateway and brokes (use https://github.com/camunda-community-hub/zeebe-docker-compose/blob/0.26.0/standalone-gateway/docker-compose.yml) – don't run any workers against this cluster.

docker-compose up -d

In a second terminal, run the following code:

import logging
from pyzeebe import ZeebeClient
logging.basicConfig(level=logging.DEBUG, format="%(asctime)s %(levelname)s [%(thread)d|%(threadName)s] %(filename)s:%(lineno)d: %(message)s")


zeebe_client = ZeebeClient(hostname="localhost", port=26500)
zeebe_client.deploy_workflow("tests/integration/test.bpmn")
try:
    workflow_instance_key, workflow_result = zeebe_client.run_workflow_with_result(
        bpmn_process_id="test",
        timeout=10000
    )
    logging.info("Result: %s", workflow_result)
except Exception as e:
    logging.exception("Run failed")

After a few seconds, pause the broker, using

docker-compose pause gateway

What you'll see is that the client waits forever (actually 2 hours) since the broker has gone down:

2021-06-25 14:11:20,716 DEBUG [123145476907008|Thread-2] zeebe_adapter_base.py:51: Grpc channel connectivity changed to: ChannelConnectivity.IDLE
2021-06-25 14:11:20,716 DEBUG [123145476907008|Thread-2] zeebe_adapter_base.py:55: Connected to localhost:26500
2021-06-25 14:11:20,716 DEBUG [123145476907008|Thread-3] zeebe_adapter_base.py:51: Grpc channel connectivity changed to: ChannelConnectivity.CONNECTING
2021-06-25 14:11:20,716 DEBUG [123145476907008|Thread-3] zeebe_adapter_base.py:61: Connecting to localhost:26500.
2021-06-25 14:11:20,718 DEBUG [123145476907008|Thread-4] zeebe_adapter_base.py:51: Grpc channel connectivity changed to: ChannelConnectivity.READY
2021-06-25 14:11:20,718 DEBUG [123145476907008|Thread-4] zeebe_adapter_base.py:55: Connected to localhost:26500

The reason being that the timeout argument is for the server side, and not the client-side. ​

Expected behavior

If connection to the gateway is lost, Pyzeebe should detect this and throw an error (or retry connection).

Version

This has been solved in the official clients:

node.js:

C#:

@kbakk kbakk added the bug Something isn't working label Jun 25, 2021
@JonatanMartens
Copy link
Collaborator

After some experimenting it seems that this:

    def _create_channel(connection_uri: str, credentials: BaseCredentials = None,
                        secure_connection: bool = False) -> grpc.Channel:
        options = (
                ("grpc.keepalive_time_ms", 45000), 
                ("grpc.keepalive_timeout_ms", 120000),
                ("grpc.http2.min_time_between_pings_ms", 60000)
        )
        if credentials:
            return grpc.secure_channel(connection_uri, credentials.grpc_credentials, options=options)
        elif secure_connection:
            return grpc.secure_channel(connection_uri, grpc.ssl_channel_credentials(), options=options)
        else:
            return grpc.insecure_channel(connection_uri, options=options)

would fix it.

Do we want these to be changed? Both the java and go clients allow this, but the node client does not.

@kbakk
Copy link
Collaborator Author

kbakk commented Jul 1, 2021

What about allowing to monkeypatch it if the user wants to change it?

We could define GRPC_CHANNEL_OPTIONS as a dict (that we convert to tuples before using), so the user can do

from pyzeebe.grpc import GRPC_CHANNEL_OPTIONS
GRPC_CHANNEL_OPTIONS.update("grpc.keepalive_time_ms": 5000)

With regards to grpc.keepalive_timeout_ms, grpc.http2.min_time_between_pings_ms – I think we should avoid specifying them, and following the convention in go/java to let the underlying C implementation set the defaults.

@kbakk
Copy link
Collaborator Author

kbakk commented Jul 1, 2021

I can add this in a PR, though not sure how to test this (except testing that default values, monkeypatching works).

@kbakk kbakk mentioned this issue Jul 1, 2021
2 tasks
@kbakk
Copy link
Collaborator Author

kbakk commented Jul 1, 2021

@JonatanMartens See PR. Let me know if you would like this ported to 3.0 branch as well - I see that I can do it quite similarly.

grpc_channel_utils.py

def create_channel(
    connection_uri: str,
    credentials: Optional[BaseCredentials] = None,
    secure_connection: bool = False
) -> grpc.aio.Channel:
    options = get_channel_options()
    if credentials:
        return grpc.aio.secure_channel(connection_uri, credentials.grpc_credentials, options=options)
    if secure_connection:
        return grpc.aio.secure_channel(connection_uri, grpc.ssl_channel_credentials(), options=options)
    return grpc.aio.insecure_channel(connection_uri, options=options)

get_channel_options could perhaps be extended to accept a dict, that allows the user to override variables (using dict merge).

@JonatanMartens
Copy link
Collaborator

@JonatanMartens See PR. Let me know if you would like this ported to 3.0 branch as well - I see that I can do it quite similarly.

That would be appreciated.

get_channel_options could perhaps be extended to accept a dict, that allows the user to override variables (using dict merge).

Good idea!

@JonatanMartens JonatanMartens mentioned this issue Jul 2, 2021
2 tasks
@JonatanMartens
Copy link
Collaborator

I'll keep this issue open until we also fix the 3.0.0 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants