-
Notifications
You must be signed in to change notification settings - Fork 805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics to measure the time a task waiting in history queue #6205
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files
... and 3 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
0370a7a
to
d2ead6c
Compare
return t.pushActivity(ctx, task, timeout, mutableState.GetExecutionInfo().PartitionConfig) | ||
err = t.pushActivity(ctx, task, timeout, mutableState.GetExecutionInfo().PartitionConfig) | ||
if err == nil { | ||
scope := common.NewPerTaskListScope(domainName, task.TaskList, types.TaskListKindNormal, t.metricsClient, metrics.TransferActiveTaskActivityScope) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it cheap to create these per tasklist scopes on the fly and discard? this will be done per shard so there will be 16k x num_tasklists of these across hosts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you planning to do the same for timer task executor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only activity retry timer pushes tasks to matching, but retry mostly happen after activity started event, it might be misleading to have this metric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't consider retry case but for example a user timer firing with X seconds delay and hence creating the corresponding decision task with X seconds delay should be captured by our new "overhead breakdown metrics".
We might not need tasklist level granularity all the way though so existing timer task latencies might be fine. However if we are introducing tasklist level granularity for decision/timer task delays in history queues we can also consider the same for timer tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decision/activity tasks are from transfer queue.
only activity retry timer from timer queue pushes activity tasks to matching.
I'm not sure which timer tasks you're referring to.
d2ead6c
to
b14e231
Compare
What changed?
Add metrics to measure the time a task waiting in history queue, which is from the time the task is written to database to the time the task is pushed to matching service
Why?
improve observability
How did you test it?
Potential risks
Release notes
Documentation Changes