Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config option to allow retry count metrics segmented by queue #32

Merged
merged 4 commits into from
Feb 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,3 +74,6 @@ Style/ExplicitBlockArgument:

Gemspec/RequiredRubyVersion:
Enabled: false

Metrics/ModuleLength:
Max: 200
11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,14 +99,15 @@ end

Configuration is handled by [anyway_config] gem. With it you can load settings from environment variables (upcased and prefixed with `YABEDA_SIDEKIQ_`), YAML files, and other sources. See [anyway_config] docs for details.

Config key | Type | Default | Description |
------------------------- | -------- | ------------------------------------------------------- |----------------------------------------------------------------------------------------------------------------------------------------------------|
`collect_cluster_metrics` | boolean | Enabled in Sidekiq worker processes, disabled otherwise | Defines whether this Ruby process should collect and expose metrics representing state of the whole Sidekiq installation (queues, processes, etc). |
`declare_process_metrics` | boolean | Enabled in Sidekiq worker processes, disabled otherwise | Declare metrics that are only tracked inside worker process even outside of them. Useful for multiprocess metric collection. |
Config key | Type | Default | Description |
---------------------------- | -------- | ------------------------------------------------------- |----------------------------------------------------------------------------------------------------------------------------------------------------|
`collect_cluster_metrics` | boolean | Enabled in Sidekiq worker processes, disabled otherwise | Defines whether this Ruby process should collect and expose metrics representing state of the whole Sidekiq installation (queues, processes, etc). |
`declare_process_metrics` | boolean | Enabled in Sidekiq worker processes, disabled otherwise | Declare metrics that are only tracked inside worker process even outside of them. Useful for multiprocess metric collection. |
`retries_segmented_by_queue` | boolean | Disabled | Defines wheter retries are segemented by queue or reported as a single metric |

# Roadmap (TODO or Help wanted)

- Implement optional segmentation of retry/schedule/dead sets
- Implement optional segmentation of schedule/dead sets

It should be disabled by default as it requires to iterate over all jobs in sets and may be very slow on large sets.

Expand Down
38 changes: 21 additions & 17 deletions lib/yabeda/sidekiq.rb
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,16 @@ def self.config
# Metrics not specific for current Sidekiq process, but representing state of the whole Sidekiq installation (queues, processes, etc)
# You can opt-out from collecting these by setting YABEDA_SIDEKIQ_COLLECT_CLUSTER_METRICS to falsy value (+no+ or +false+)
if config.collect_cluster_metrics # defaults to +::Sidekiq.server?+
gauge :jobs_waiting_count, tags: %i[queue], aggregation: :most_recent, comment: "The number of jobs waiting to process in sidekiq."
gauge :active_workers_count, tags: [], aggregation: :most_recent, comment: "The number of currently running machines with sidekiq workers."
gauge :jobs_scheduled_count, tags: [], aggregation: :most_recent, comment: "The number of jobs scheduled for later execution."
gauge :jobs_retry_count, tags: [], aggregation: :most_recent, comment: "The number of failed jobs waiting to be retried"
gauge :jobs_dead_count, tags: [], aggregation: :most_recent, comment: "The number of jobs exceeded their retry count."
gauge :active_processes, tags: [], aggregation: :most_recent, comment: "The number of active Sidekiq worker processes."
gauge :queue_latency, tags: %i[queue], aggregation: :most_recent,
retry_count_tags = config.retries_segmented_by_queue ? %i[queue] : []

gauge :jobs_waiting_count, tags: %i[queue], aggregation: :most_recent, comment: "The number of jobs waiting to process in sidekiq."
gauge :active_workers_count, tags: [], aggregation: :most_recent,
comment: "The number of currently running machines with sidekiq workers."
gauge :jobs_scheduled_count, tags: [], aggregation: :most_recent, comment: "The number of jobs scheduled for later execution."
gauge :jobs_retry_count, tags: retry_count_tags, aggregation: :most_recent, comment: "The number of failed jobs waiting to be retried"
gauge :jobs_dead_count, tags: [], aggregation: :most_recent, comment: "The number of jobs exceeded their retry count."
gauge :active_processes, tags: [], aggregation: :most_recent, comment: "The number of active Sidekiq worker processes."
gauge :queue_latency, tags: %i[queue], aggregation: :most_recent,
comment: "The queue latency, the difference in seconds since the oldest job in the queue was enqueued"
end

Expand All @@ -73,21 +76,22 @@ def self.config
sidekiq_jobs_scheduled_count.set({}, stats.scheduled_size)
sidekiq_jobs_dead_count.set({}, stats.dead_size)
sidekiq_active_processes.set({}, stats.processes_size)
sidekiq_jobs_retry_count.set({}, stats.retry_size)

::Sidekiq::Queue.all.each do |queue|
sidekiq_queue_latency.set({ queue: queue.name }, queue.latency)
end

# That is quite slow if your retry set is large
# I don't want to enable it by default
# retries_by_queues =
# ::Sidekiq::RetrySet.new.each_with_object(Hash.new(0)) do |job, cntr|
# cntr[job["queue"]] += 1
# end
# retries_by_queues.each do |queue, count|
# sidekiq_jobs_retry_count.set({ queue: queue }, count)
# end
if config.retries_segmented_by_queue
retries_by_queues =
::Sidekiq::RetrySet.new.each_with_object(Hash.new(0)) do |job, cntr|
cntr[job["queue"]] += 1
end
retries_by_queues.each do |queue, count|
sidekiq_jobs_retry_count.set({ queue: queue }, count)
end
else
sidekiq_jobs_retry_count.set({}, stats.retry_size)
end
end
end

Expand Down
4 changes: 4 additions & 0 deletions lib/yabeda/sidekiq/config.rb
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ class Config < ::Anyway::Config

# Declare metrics that are only tracked inside worker process even outside them
attr_config declare_process_metrics: ::Sidekiq.server?

# Retries are tracked by default as a single metric. If you want to track them separately for each queue, set this to +true+
# Disabled by default because it is quite slow if the retry set is large
attr_config retries_segmented_by_queue: false
end
end
end