-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new module to convert cp stats into prometheus format suited for CMS monitoring #9940
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really a review of the code you proposed, but a couple of things caught my attention:
- please use lower camelCase for the variable and function names (prom_metrics --> promMetrics)
- given that this is going to be a common Utils module, please provide docstrings as completed as you can (there is no model defined yet, but it looks like we are adopting reStructured Text)
- can you also please create unit test(s) for this module
Can you please take care of the comments above? Thanks
Jenkins results:
|
29fa5d7
to
3984c29
Compare
Jenkins results:
|
Alan, I added requested changes, please review at your convenience. |
Jenkins results:
|
Jenkins results:
|
@todor-ivanov please review this PR which provides metrics. But as I explained in description once this PR is in place we'll need a new end-point similar to /stats for WMCore/REST code to serve these metrics (the code is shown in description and will be required for monitoring). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valya, I made few minor comments in the code. But, if you think those are too picky, just ignore them, we can definitely live with the code as it is.
All the rest looks good to me!
@todor-ivanov please review again as I resolved all requested suggestions. |
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Valya, I think it is looking good now. Just to clear my own understanding. From your comment in the PR I read that provided the code like that it is not referred anywhere outside. So we should make an explicit call to it from here: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/REST/Server.py#L700
@expose def metrics(self): name = self.__class__.__name__ return prom_metrics(name)
If I am correct it would be nice to have it called in this same PR so that we avoid making too many PRs for the same functionality. @amaltaro, Please take a look and tell me if you like to have the code merged with or without being referred in this PR.
Todor,
|
Valentin, just to let you know that this PR is on my todo list for the coming days. I still think there is either something missing or a misunderstanding with this proposal, so I'll have to test a few things myself first. |
@amaltaro , I incorporated your changes from #9961 over here. I removed cherrypy deps and TestMetricsServer from CPMetrics.py code. But it would be nice to keep TestMetricsServer some place that users can easily see how to construct required end-point and test it easily. I created this file but did not include it in this PR. If you think it is relevant let me know where to put it, otherwise we may just skip it.
|
@amaltaro , I found where stats and metrics can be plugged into WebTools server. Looking at DBS code it inherits WebTools RESTModel, which by itself inherits from WebAPI.py. The whole chain of dependencies is the following:
So, I added stats and metrics to WebAPI and exposed them. Doing this way all DBS servers (inherited from WebTools) will get these end-points. Please review. |
Jenkins results:
|
Jenkins results:
|
I had a look at pylint report and all of the failures related to WebAPI.py codebase which I didn't touched. I don't think it is my responsibility to go through them and include in this PR. Please let me know how you'll treat it. |
I had a look at Alan's VM and even though I can access provided end-points via https calls they are not accessible from localhost and plain http:
Such strict constrains make entire work a waste since prometheus does not care about cmsweb auth x509 schema and all metrics should be accessible through normal localhost request. Therefore, unless I miss something or unless you tell me where to change REST server code to allow http requests from localhost I don't see a point how it can be used because REST server requires cmsweb x509 authentication for all exposed methods. |
Jenkins results:
|
Jenkins results:
|
Alan, pylint score is 10 for new code and its unit test. All others are old code which I don't want to touch in this PR. The unit tests are unstable. Please review once you'll get a chance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
Before we get it merged, can you please deploy these changes to your test9 cluster and run a basic curl test of the new REST API?
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valentin, please see my partial review on your latest changes.
Alan, I made necessary changes, please let me know. |
Now commits are squashed too. |
Jenkins results:
|
Jenkins results:
|
The Jenkins CI report is your friend, please take advantage of it ;)
Once you are done with code changes, please run another test in your central services setup testing reqmgr2 endpoints like: stats, metrics and info. Please let me know how that goes. |
Jenkins results:
|
Alan, test is fixed (adding encodeUnicodetoBytes still requires a check) and code is tested on test9, e.g.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks!
From a first quick pick, this is beautiful! We had the idea of doing something similar and will likely steal from this in the future :) |
FYI @amaltaro , @mapellidario , @novicecpp I made corresponding monitoring ticket: https://its.cern.ch/jira/browse/CMSMONIT-514 to request CherryPy dashboard. Please follow up monitoring ticket and request your dashboard for your python based service. |
Fixes #9939
Status
ready
Description
I provide a new module which can be integrated into WMCore stack. It provides useful set of functions to flatten CherryPy stats, make static schema and provide them in format suitable for Prometheus server.
I suggest to add an additional
metrics
endpoint to complementstats
one shown in herehttps://github.com/dmwm/WMCore/blob/master/src/python/WMCore/REST/Server.py#L700
with the following
This will provide ability to scrape all cherrypy metrics from Prometheus for any DMWM/REST server. Please refer to docstring of provided module for format description/example.
Is it backward compatible (if not, which system it affects?)
YES
Related PRs
External dependencies / deployment changes
this module uses cherrypy lib to get cherrypy metrics