Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics to measure the time a task waiting in history queue #6205

Merged
merged 1 commit into from
Aug 1, 2024

Conversation

Shaddoll
Copy link
Member

@Shaddoll Shaddoll commented Jul 31, 2024

What changed?
Add metrics to measure the time a task waiting in history queue, which is from the time the task is written to database to the time the task is pushed to matching service

Why?
improve observability

How did you test it?

Potential risks

Release notes

Documentation Changes

Copy link

codecov bot commented Jul 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.89%. Comparing base (38c295d) to head (b14e231).

Additional details and impacted files
Files Coverage Δ
common/util.go 78.85% <100.00%> (+0.38%) ⬆️
service/history/task/task.go 79.12% <ø> (ø)
...vice/history/task/transfer_active_task_executor.go 66.99% <100.00%> (+0.54%) ⬆️
service/matching/handler/context.go 46.15% <100.00%> (ø)
service/matching/tasklist/task_list_manager.go 63.65% <100.00%> (-0.42%) ⬇️

... and 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 38c295d...b14e231. Read the comment docs.

@Shaddoll Shaddoll force-pushed the history-metrics branch 2 times, most recently from 0370a7a to d2ead6c Compare July 31, 2024 22:59
common/util_test.go Show resolved Hide resolved
return t.pushActivity(ctx, task, timeout, mutableState.GetExecutionInfo().PartitionConfig)
err = t.pushActivity(ctx, task, timeout, mutableState.GetExecutionInfo().PartitionConfig)
if err == nil {
scope := common.NewPerTaskListScope(domainName, task.TaskList, types.TaskListKindNormal, t.metricsClient, metrics.TransferActiveTaskActivityScope)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it cheap to create these per tasklist scopes on the fly and discard? this will be done per shard so there will be 16k x num_tasklists of these across hosts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you planning to do the same for timer task executor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only activity retry timer pushes tasks to matching, but retry mostly happen after activity started event, it might be misleading to have this metric

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't consider retry case but for example a user timer firing with X seconds delay and hence creating the corresponding decision task with X seconds delay should be captured by our new "overhead breakdown metrics".
We might not need tasklist level granularity all the way though so existing timer task latencies might be fine. However if we are introducing tasklist level granularity for decision/timer task delays in history queues we can also consider the same for timer tasks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decision/activity tasks are from transfer queue.
only activity retry timer from timer queue pushes activity tasks to matching.
I'm not sure which timer tasks you're referring to.

@Shaddoll Shaddoll merged commit 95bc620 into cadence-workflow:master Aug 1, 2024
20 checks passed
@Shaddoll Shaddoll deleted the history-metrics branch August 1, 2024 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants