-
Notifications
You must be signed in to change notification settings - Fork 805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add emitWorkflowVersionMetrics for pinot #6190
Conversation
return fmt.Sprintf(` | ||
SELECT WorkflowType, COUNT(*) AS count | ||
FROM %s | ||
WHERE DomainID = '%s' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: %q
to replace '%s'
according to https://pkg.go.dev/fmt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For strings, %q returns a double-quoted string safely escaped with Go syntax, but in Pinot, Where DomainID = "" doesn't work. It has to be single quoted.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files
... and 14 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
domainWorkflowVersionCount.WorkflowTypes = append(domainWorkflowVersionCount.WorkflowTypes, WorkflowTypeCount{ | ||
EsAggregateCount: EsAggregateCount{ | ||
AggregateKey: workflowType, | ||
AggregateCount: int64(workflowCount), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
workflowCount is from first call; this will be different from the summation of counts from subsequent calls by workflowtypes. But you could instead use the summation to be at least self consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's one sample result from ES:
{ "key": "UpfrontChargeWorkflow::start", "doc_count": 182, "versions": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "waitForPSPCallback-1", "doc_count": 149 } ] } },
The count of workflow type is different from the summation of the counts of CadenceChangeVersions. I was thinking if this is designed on purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we group by WorkflowType and CadenceChangeVersion, so it can have the count per version and per type. I tried and it is working
SELECT JSON_EXTRACT_SCALAR(Attr, '$.CadenceChangeVersion', 'STRING_ARRAY') AS CadenceChangeVersion, COUNT(*) AS count, workflowtype
FROM rta.rta.cadence_visibility_production
WHERE IsDeleted = false
AND CloseStatus = -1
AND StartTime > 0
AND JSON_EXTRACT_SCALAR(Attr, '$.CadenceChangeVersion', 'STRING_ARRAY') IS NOT NULL
GROUP BY JSON_EXTRACT_SCALAR(Attr, '$.CadenceChangeVersion', 'STRING_ARRAY'), workflowtype
ORDER BY count DESC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That query means to count all the workflowTypes which has CadenceChangeVersion. This is different from the ES result. For that ES query, it means to first, find the top 10 workflow types by count, and then, within these 10 workflow types, identify the top 10 CadenceChangeVersions count for each.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, group by version and type will filter the records without CadenceChangeVersion. Need to verify if we need to emit that count, if not we can go with this approach.
return err | ||
} | ||
var domainWorkflowVersionCount DomainWorkflowVersionCount | ||
for _, row := range response { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10x latency might be an issue for metrics emission. Could you parallelize it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this in parallel with multiple threads, is there a risk when metrics are emitted, the workflow still doesn't have all the data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This metrics doesn't care about the latency, since we run it every 5 or 10 minutes. But we can eliminate the calls when we aggregate by both version and type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this with Ender offline as well. We are going to keep this approach.
Co-authored-by: Shijie Sheng <[email protected]>
What changed?
Add emitWorkflowVersionMetrics for pinot. Because pinot doesn't support one aggr inside of anther like ES, I had to separate the query into 2.
Why?
To make ES analyzer becomes a generic visibility analyzer
How did you test it?
unit test
Potential risks
At worst, query time might be 10x.
But doesn't matter too much.
Release notes
Documentation Changes