Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a new metric to the workflow cache #6064

Merged
merged 3 commits into from
May 30, 2024

Conversation

jakobht
Copy link
Member

@jakobht jakobht commented May 27, 2024

What changed?
In the per workflow Id cache we have added request counting for each workflow Id for the external rate limits. We keep track of the number of requests for the current second for each workflow ID and emit this as a timer metric.

The timer metric keeps track of different percentiles of the metrics, as well as a max count, so we do not need to do this bookkeeping manually.

Why?
We use the tracked information above to emit a metric stating the request count per workflow ID in a domain. We have to do the bookkeeping manually. Emitting a metric for each workflow ID would have way to high cardinality, and would overwhelm the metrics system.

How did you test it?
Unit tests

Potential risks
It should just emit a metric, so should be low risk.

Release notes

Documentation Changes

Copy link

codecov bot commented May 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.14%. Comparing base (2f08de5) to head (73afab2).
Report is 5 commits behind head on master.

Current head 73afab2 differs from pull request most recent head 557e21c

Please upload reports for the commit 557e21c to get more accurate results.

Additional details and impacted files
Files Coverage Δ
service/history/workflowcache/cache.go 91.95% <100.00%> (+0.18%) ⬆️
service/history/workflowcache/metrics.go 100.00% <100.00%> (ø)

... and 13 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f08de5...557e21c. Read the comment docs.

This is to track the max number of requests to a single workflow per
domain
@coveralls
Copy link

coveralls commented May 27, 2024

Pull Request Test Coverage Report for Build 018fc948-db5d-46cf-88b8-780d605911b9

Details

  • 18 of 18 (100.0%) changed or added relevant lines in 2 files are covered.
  • 299 unchanged lines in 21 files lost coverage.
  • Overall coverage decreased (-0.1%) to 69.523%

Files with Coverage Reduction New Missed Lines %
common/task/weighted_round_robin_task_scheduler.go 1 88.06%
common/task/parallel_task_processor.go 2 93.06%
service/matching/tasklist/db.go 2 73.23%
common/dynamicconfig/constants.go 2 99.05%
service/matching/tasklist/task_list_manager.go 2 76.48%
common/task/fifo_task_scheduler.go 2 85.57%
common/persistence/sql/sqlplugin/mysql/task.go 2 73.68%
common/membership/hashring.go 2 84.69%
common/persistence/sql/sqlplugin/mysql/db.go 2 79.49%
service/history/task/task.go 3 84.81%
Totals Coverage Status
Change from base Build 018fc614-f430-4db4-8939-669fc38230e0: -0.1%
Covered Lines: 102545
Relevant Lines: 147497

💛 - Coveralls

service/history/workflowcache/cache.go Outdated Show resolved Hide resolved
service/history/workflowcache/cache.go Outdated Show resolved Hide resolved
service/history/workflowcache/metrics.go Outdated Show resolved Hide resolved
service/history/workflowcache/metrics.go Outdated Show resolved Hide resolved
service/history/workflowcache/metrics.go Outdated Show resolved Hide resolved
- Change from gauge to timer
- Only track for workflow IDs the timer will do the bookkeeping for max
  value
Copy link
Member

@3vilhamster 3vilhamster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed offline.
The metric counts per workflowID maximum with per-domain dimension.
This could be helpful, though it cannot answer the question about offending workflowID, but at least can help identifying/informing customers on the possible problems.

@jakobht jakobht merged commit 2c83c16 into cadence-workflow:master May 30, 2024
18 checks passed
@jakobht jakobht deleted the warningMetric branch May 30, 2024 12:35
timl3136 pushed a commit to timl3136/cadence that referenced this pull request Jun 6, 2024
* Added a new metric to the workflow cache

This is to track the max number of requests to a single workflow per
domain

* Updated based on review

- Change from gauge to timer
- Only track for workflow IDs the timer will do the bookkeeping for max
  value

* Moved the method to workflowIDCountMetric as it modifies this only
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants