Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add emitWorkflowVersionMetrics for pinot #6190

Merged
merged 5 commits into from
Jul 30, 2024

Conversation

bowenxia
Copy link
Member

@bowenxia bowenxia commented Jul 25, 2024

What changed?
Add emitWorkflowVersionMetrics for pinot. Because pinot doesn't support one aggr inside of anther like ES, I had to separate the query into 2.

  1. find the aggr of workflowTypes, (the top 10 count of workflowTypes)
  2. find the aggr of CadenceChangeVersion under a specific workflowType (the top 10 count of CadenceChangeVersion in a specific workflowType)

Why?
To make ES analyzer becomes a generic visibility analyzer

How did you test it?
unit test

Potential risks
At worst, query time might be 10x.
But doesn't matter too much.

Release notes

Documentation Changes

return fmt.Sprintf(`
SELECT WorkflowType, COUNT(*) AS count
FROM %s
WHERE DomainID = '%s'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: %q to replace '%s' according to https://pkg.go.dev/fmt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For strings, %q returns a double-quoted string safely escaped with Go syntax, but in Pinot, Where DomainID = "" doesn't work. It has to be single quoted.

Copy link

codecov bot commented Jul 25, 2024

Codecov Report

Attention: Patch coverage is 96.87500% with 4 lines in your changes missing coverage. Please review.

Project coverage is 73.12%. Comparing base (95ba44c) to head (4eed31a).
Report is 7 commits behind head on master.

Additional details and impacted files
Files Coverage Δ
...rker/esanalyzer/domainWorkflowTypeCountWorkflow.go 83.03% <100.00%> (ø)
service/worker/esanalyzer/workflow.go 89.91% <96.82%> (+8.35%) ⬆️

... and 14 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 95ba44c...4eed31a. Read the comment docs.

domainWorkflowVersionCount.WorkflowTypes = append(domainWorkflowVersionCount.WorkflowTypes, WorkflowTypeCount{
EsAggregateCount: EsAggregateCount{
AggregateKey: workflowType,
AggregateCount: int64(workflowCount),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workflowCount is from first call; this will be different from the summation of counts from subsequent calls by workflowtypes. But you could instead use the summation to be at least self consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's one sample result from ES:

{ "key": "UpfrontChargeWorkflow::start", "doc_count": 182, "versions": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "waitForPSPCallback-1", "doc_count": 149 } ] } },

The count of workflow type is different from the summation of the counts of CadenceChangeVersions. I was thinking if this is designed on purpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we group by WorkflowType and CadenceChangeVersion, so it can have the count per version and per type. I tried and it is working

SELECT JSON_EXTRACT_SCALAR(Attr, '$.CadenceChangeVersion', 'STRING_ARRAY') AS CadenceChangeVersion, COUNT(*) AS count, workflowtype
FROM rta.rta.cadence_visibility_production
WHERE IsDeleted = false
  AND CloseStatus = -1
  AND StartTime > 0
  AND JSON_EXTRACT_SCALAR(Attr, '$.CadenceChangeVersion', 'STRING_ARRAY') IS NOT NULL
GROUP BY JSON_EXTRACT_SCALAR(Attr, '$.CadenceChangeVersion', 'STRING_ARRAY'), workflowtype
ORDER BY count DESC

Copy link
Member Author

@bowenxia bowenxia Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That query means to count all the workflowTypes which has CadenceChangeVersion. This is different from the ES result. For that ES query, it means to first, find the top 10 workflow types by count, and then, within these 10 workflow types, identify the top 10 CadenceChangeVersions count for each.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, group by version and type will filter the records without CadenceChangeVersion. Need to verify if we need to emit that count, if not we can go with this approach.

return err
}
var domainWorkflowVersionCount DomainWorkflowVersionCount
for _, row := range response {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10x latency might be an issue for metrics emission. Could you parallelize it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this in parallel with multiple threads, is there a risk when metrics are emitted, the workflow still doesn't have all the data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metrics doesn't care about the latency, since we run it every 5 or 10 minutes. But we can eliminate the calls when we aggregate by both version and type

Copy link
Member Author

@bowenxia bowenxia Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this with Ender offline as well. We are going to keep this approach.

@bowenxia bowenxia merged commit 9a7a8a4 into master Jul 30, 2024
21 checks passed
@bowenxia bowenxia deleted the xbowen_refactor_ESanalyzer_02 branch July 30, 2024 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants