Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby 3.3.5 - Consumer Pub Sub - Crashing with Cancelled error recurrently #27448

Open
rmzoni opened this issue Oct 15, 2024 · 3 comments
Open
Assignees

Comments

@rmzoni
Copy link

rmzoni commented Oct 15, 2024

Environment details

OS: linux
Ruby version: 3.3.5
Gem name and version: google-cloud-pubsub v2.19.0

Steps to reproduce

  • just start the ConsumeEvents.new.consume method bellow using ruby 3.3.5
  • After a few minutes the error will happen and the process will stop
  • The error is: 1:CANCELLED. debug_error_string:{UNKNOWN:Error received from peer {grpc_message:"CANCELLED", grpc_status:1, created_time:"2024-10-15T13:48:37.628426185+00:00"}} (GRPC::Cancelled)

obs. This error occurs in a interval of 5 to 10 minutes and the POD in our K8s got restarted.

Code example

class ConsumeEvents
  def initialize(config, overrides = {})
    @config = config
    @subscriber_service = overrides.fetch(:subscriber_service) { ::Events::PubSub::Subscriber.new }
    @subscribers = {}
    @logger = overrides.fetch(:logger) { GRPC.logger }
  end

  def consume
    # Gracefully shut down the subscriber on program exit, blocking until
    # all received messages have been processed or n seconds have passed
    at_exit { stop_subscribers }

    loop do
      subscribe_to_configured_topics
      # Block, letting processing threads continue in the background
      sleep(15.seconds)
    end
  end

  def subscribe_to_configured_topics
    @config.each do |consumer_config|
      next if @subscribers.key?(consumer_config.subscription)

      subscriber = nil
      begin
        subscriber = @subscriber_service
          .with_topic_id(consumer_config.topic)
          .with_subscription_name(consumer_config.subscription) ## Error happen here
          ## This methos run this code herer
          # @pubsub = overrides.fetch(:pubsub) { Google::Cloud::Pubsub.new(project_id: ENV.fetch("PUBSUB_PROJECT_ID"), emulator_host: ENV.fetch("PUBSUB_EMULATOR_HOST", nil)) }
          # @pubsub.subscription(subscriber_name) || topic.subscribe(subscriber_name, **opts)
          .listen do |received_message|
            process_message(received_message, consumer_config)
          end
      rescue Events::PubSub::Subscriber::TopicNotFound => e
        set_error_on_span("Could not subscribe to topic: `#{consumer_config.key}`. Reason: #{e.message}")
        next
      rescue => e
        message = "General error #{e.class}: `#{consumer_config.key}` - `#{consumer_config.topic}` - `#{consumer_config.subscription}`. Reason: #{e.message}"
        set_error_on_span(message)
        @logger.error(message)

        raise StandardError.new(message)
      end

      subscriber.on_error { |exception| handle_error(exception, consumer_config) }

      subscriber.start

      @logger.info("Subscribed to #{consumer_config.topic}")
      @subscribers[consumer_config.subscription] = subscriber
    end
  end

  ...

end

Full backtrace

usr/local/bundle/gems/grpc-1.66.0-x86_64-linux/src/ruby/lib/grpc/generic/active_call.rb:29:in check_status': 1:CANCELLED. debug_error_string:{UNKNOWN:Error received from peer {grpc_message:"CANCELLED", grpc_status:1, created_time:"2024-10-15T13:48:37.628426185+00:00"}} (GRPC::Cancelled) from /app/app/infra/events/pub_sub/subscription.rb:18:in subscribe_to'
│ from /app/app/infra/events/pub_sub/subscriber.rb:22:in with_subscription_name' │ from /app/app/infra/events/pub_sub/consumer/consume_events.rb:32:in block in subscribe_to_configured_topics'
│ from /app/app/infra/events/pub_sub/consumer/config.rb:53:in each' │ from /app/app/infra/events/pub_sub/consumer/config.rb:53:in each'
│ from /app/app/infra/events/pub_sub/consumer/consume_events.rb:25:in `subscribe_to_configured_topics'

@aandreassa
Copy link
Contributor

@rmzoni or any others experiencing the problem, does the error still occur when downgrading below v2.19.0?

Please try downgrading the underlying GAPIC as well (google-cloud-pubsub-v1) for testing the previous versions. We haven't been able to reproduce the problem yet, so this would be useful info.

@rmzoni
Copy link
Author

rmzoni commented Nov 12, 2024

@rmzoni or any others experiencing the problem, does the error still occur when downgrading below v2.19.0?

Please try downgrading the underlying GAPIC as well (google-cloud-pubsub-v1) for testing the previous versions. We haven't been able to reproduce the problem yet, so this would be useful info.

Hi @aandreassa we´ve already made this test for lower versions. The problem still happen on version 2.15.1.

davidgisbey added a commit to alphagov/search-api that referenced this issue Nov 21, 2024
We've had to revert out Ruby bump to 3.3.6 due to an issue with the load
traffic job. We were getting the following error:

```
oogle::Cloud::CanceledError: 1:CANCELLED. debug_error_string:
{
  UNKNOWN:Error received from peer {
    created_time:"2024-11-20T22:00:03.584412449+00:00",
    grpc_status:1,
    grpc_message:"CANCELLED"
  }
} (Google::Cloud::CanceledError)
/usr/local/bundle/ruby/3.3.0/gems/google-analytics-data-v1beta-0.13.1/lib/google/analytics/data/v1beta/analytics_data/client.rb:369:
```

Looking the gems repo we can see that someone else has opened an issue
for the same error googleapis/google-cloud-ruby#27448

govuk-ruby-base (https://github.com/alphagov/govuk-ruby-images/blob/93c59fe31019cc0621f607ac6bbb0683b111e1b6/README.md?plain=1#L8)
only takes minor version of Ruby so we can't bump higher than 3.2 or the production
apps will defer to the latest stable Ruby version which will break the
rake task.

This bumps Ruby to the latest 3.2 version which is 3.2.6. Once the issue
has been resolved we can then bump this to 3.3.6 (or a later version
if one is released).
@rmzoni
Copy link
Author

rmzoni commented Nov 28, 2024

@rmzoni or any others experiencing the problem, does the error still occur when downgrading below v2.19.0?
Please try downgrading the underlying GAPIC as well (google-cloud-pubsub-v1) for testing the previous versions. We haven't been able to reproduce the problem yet, so this would be useful info.

Hi @aandreassa we´ve already made this test for lower versions. The problem still happen on version 2.15.1.

HI @aandreassa, any news about this issue? I saw that are other repos experience the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants