Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Several "Unable to poll" errors multiple times per day #291

Open
poblouin opened this issue Mar 19, 2024 · 2 comments
Open

[Question] Several "Unable to poll" errors multiple times per day #291

poblouin opened this issue Mar 19, 2024 · 2 comments

Comments

@poblouin
Copy link

poblouin commented Mar 19, 2024

Hey folks,

We've been using this ruby for quite a while, 1 year I would say and since then, we've been seeing a lot of "Unable to poll" errors. For example, last week, there were 700 of those for one of our service. This affects all our different using this sdk.

Unable to poll Workflow task queue

#<GRPC::Unavailable: 14:Socket closed. debug_error_string:{UNKNOWN:Error received from peer  {grpc_message:"Socket closed", grpc_status:14, created_time:"2024-03-19T14:52:38.963074635+00:00"}}>

And this

Unable to poll activity task queue

#<GRPC::Unavailable: 14:Socket closed. debug_error_string:{UNKNOWN:Error received from peer  {created_time:"2024-03-19T14:52:38.963058558+00:00", grpc_status:14, grpc_message:"Socket closed"}}>

Would someone have an idea of what might be causing those errors? Are they "normal" or expected from time to time within Temporal? We don't have a fancy configuration or anything we mostly use the default config and we use 20 as the thread pool size for both activity and workflow. I can share more relevant details if needed.

Note, this error doesn't cause any of our activities or workflows to fail, it just happens once every now and then and recovers.

temporal-ruby + grpc version

GIT
  remote: https://github.com/coinbase/temporal-ruby
  revision: 3e0dae708ec0e3eab8c44b57b64f5cd1881848e6
  
grpc (1.59.2)
@cdimitroulas
Copy link

For what it's worth, I also see this on our Ruby apps that use Temporal. I would love to understand more about why this happens.

@michael-cybrid
Copy link

We were running into this and ended up setting the following GRPC channel args. Hacky but it works. The issue seems to be that the socket is silently being torn down by the Temporal server. Keeping it alive avoids the issue.

            updated_stub_class = Class.new(::Temporalio::Api::WorkflowService::V1::WorkflowService::Stub) do
              def initialize(host, creds, channel_args: {}, **options)
                # see https://github.com/grpc/grpc/blob/master/doc/keepalive.md

                # This channel argument controls the period (in milliseconds) after which a keepalive ping is sent on
                # the transport.
                channel_args['grpc.keepalive_time_ms'] = 30_000 unless channel_args.include?('grpc.keepalive_time_ms')

                # This channel argument controls the amount of time (in milliseconds) the sender of the keepalive ping
                # waits for an acknowledgement. If it does not receive an acknowledgment within this time, it will close
                # the connection.
                unless channel_args.include?('grpc.keepalive_timeout_ms')
                  channel_args['grpc.keepalive_timeout_ms'] = 20_000
                end

                # This channel argument if set to 1 (0 : false; 1 : true), allows keepalive pings to be sent even if
                # there are no calls in flight.
                unless channel_args.include?('grpc.keepalive_permit_without_calls')
                  channel_args['grpc.keepalive_permit_without_calls'] = 1
                end

                # This channel argument controls the maximum number of pings that can be sent when there is no
                # data/header frame to be sent. gRPC Core will not continue sending pings if we run over the limit.
                # Setting it to 0 allows sending pings without such a restriction.
                unless channel_args.include?('grpc.http2.max_pings_without_data')
                  channel_args['grpc.http2.max_pings_without_data'] = 0
                end

                super
              end
            end
            ::Temporalio::Api::WorkflowService::V1::WorkflowService.const_set(:Stub, updated_stub_class)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants