-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics/GenevaActions for Clustersync #3785
base: master
Are you sure you want to change the base?
Conversation
3cdbf8c
to
8d1a6e9
Compare
b5ac73b
to
99fa8df
Compare
08835f8
to
41a6e7c
Compare
8d33911
to
afbaf3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes looks good to me
afbaf3b
to
548e097
Compare
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
88d5029
to
99c51f8
Compare
99c51f8
to
a04c6bb
Compare
Please rebase pull request. |
a04c6bb
to
99c3f75
Compare
60aee73
to
e884d16
Compare
merging 8659 and 9545 Metrics for SyncSet and SelectorSyncSets
e884d16
to
4522c6e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good, just a few questions about the metric we emit to make sure the metric works for us downstream (dashboarding/alerting).
if clusterSync.Status.SyncSets != nil { | ||
for _, s := range clusterSync.Status.SyncSets { | ||
mon.emitGauge("hive.clustersync", 1, map[string]string{ | ||
"metric": "SyncSets", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: having a dimension on the metric named "metric"
might be a little confusing - should we rename this to something else? for example syncType
?
if clusterSync != nil { | ||
if clusterSync.Status.SyncSets != nil { | ||
for _, s := range clusterSync.Status.SyncSets { | ||
mon.emitGauge("hive.clustersync", 1, map[string]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Do we want to change what "value" we emit here based on the success/failure state of the syncset? For example, return 1
for Successful syncsets and 0
for failed syncsets?
This might make it easier for us to, for example, build downstream dashboards or alerts based off of this metric.
@@ -23,7 +23,9 @@ var _ = Describe("Monitor", func() { | |||
wg.Add(1) | |||
mon, err := cluster.NewMonitor(log, clients.RestConfig, &api.OpenShiftCluster{ | |||
ID: resourceIDFromEnv(), | |||
}, &noop.Noop{}, nil, true, &wg) | |||
}, &api.OpenShiftClusterDocument{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be worth adding E2E tests for both the monitor and Geneva Actions functionality. We have various contexts in which E2E runs against an RP with Hive enabled (production/release E2E, PR E2E after we move to the containerized implementation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is too difficult to add the E2E tests in this PR, this can become follow-up work, however.
} | ||
|
||
func NewMonitor(log *logrus.Entry, restConfig *rest.Config, oc *api.OpenShiftCluster, m metrics.Emitter, hiveRestConfig *rest.Config, hourlyRun bool, wg *sync.WaitGroup) (*Monitor, error) { | ||
func NewMonitor(log *logrus.Entry, restConfig *rest.Config, oc *api.OpenShiftCluster, doc *api.OpenShiftClusterDocument, m metrics.Emitter, hiveRestConfig *rest.Config, hourlyRun bool, wg *sync.WaitGroup, hiveClusterManager hive.ClusterManager) (*Monitor, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little strange to me that we have both oc
and doc
here (as oc
should be a subproperty of doc
), and hiveRestConfig
and hiveClusterManager
.
I think there's an opportunity for us to deduplicate some of these dependencies, but that can be a follow-up refactor.
Which issue this PR addresses:
ARO-9545 and ARO-8659 both JIRA's have common code
What this PR does / why we need it:
Test plan for issue:
Unit test cases added.
Need to create respective metrics dashboard in Geneva.
Is there any documentation that needs to be updated for this PR?
Will create TSGs for respective metrics.
How do you know this will function as expected in production?
Monitor from Geneva Dashboard.