-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Monitoring Plugin #3078
Comments
The idea sounds great 👍🏼 . |
GitHub-issue: kubernetes-sigs#3078 Signed-off-by: Aviv Litman <[email protected]>
GitHub-issue: kubernetes-sigs#3078 Signed-off-by: Aviv Litman <[email protected]>
GitHub-issue: kubernetes-sigs#3078 Signed-off-by: Aviv Litman <[email protected]>
GitHub-issue: kubernetes-sigs#3078 Signed-off-by: Aviv Litman <[email protected]>
HI @sradco, Interesting. Have we an example of what would you like to scaffold within? |
@sradco IMO When I first read the issue, I was expecting a scaffold that will help deploy Prometheus stack (creating Prometheus resources). But, this is more about generating metrics (which may or may not be consumed by Prometheus). So, we should probably consider renaming the issue and plugin name to be more specific to metrics. And since we're talking about best practices, will there be a recommended set of metrics that all operators should implement? I like the overall idea. |
GitHub-issue: kubernetes-sigs#3078 Signed-off-by: Aviv Litman <[email protected]>
GitHub-issue: kubernetes-sigs#3078 Signed-off-by: Aviv Litman <[email protected]>
GitHub-issue: kubernetes-sigs#3078 Signed-off-by: Aviv Litman <[email protected]>
@camilamacedo86 @umangachapagain I updated the description of the issue. The plugin is meant to help the operators developers with the on boarding process of adding monitoring (metrics, alerts, recording rules and more) to their operator and to offer the best practices to do that correctly. For example, having the metrics code logic in /monitoring/metrics and not inside the core operator code. In the core operator code there should only be a call to update the metric. |
GitHub-issue: kubernetes-sigs#3078 Signed-off-by: Aviv Litman <[email protected]> Uncomment monitoring go files Signed-off-by: João Vilaça <[email protected]> Generate working monitoring scaffolding Signed-off-by: João Vilaça <[email protected]> Allow kubebuilder init with monitoring bundle Signed-off-by: João Vilaça <[email protected]> Add metrics register to main file Signed-off-by: João Vilaça <[email protected]> Update testdata Signed-off-by: João Vilaça <[email protected]> Improve comments Signed-off-by: Aviv Litman <[email protected]>
GitHub-issue: kubernetes-sigs#3078 Signed-off-by: Aviv Litman <[email protected]> Uncomment monitoring go files Signed-off-by: João Vilaça <[email protected]> Generate working monitoring scaffolding Signed-off-by: João Vilaça <[email protected]> Allow kubebuilder init with monitoring bundle Signed-off-by: João Vilaça <[email protected]> Add metrics register to main file Signed-off-by: João Vilaça <[email protected]> Update testdata Signed-off-by: João Vilaça <[email protected]> Improve comments Signed-off-by: Aviv Litman <[email protected]>
GitHub-issue: kubernetes-sigs#3078 Signed-off-by: Aviv Litman <[email protected]> Uncomment monitoring go files Signed-off-by: João Vilaça <[email protected]> Generate working monitoring scaffolding Signed-off-by: João Vilaça <[email protected]> Allow kubebuilder init with monitoring bundle Signed-off-by: João Vilaça <[email protected]> Add metrics register to main file Signed-off-by: João Vilaça <[email protected]> Update testdata Signed-off-by: João Vilaça <[email protected]> Improve comments Signed-off-by: Aviv Litman <[email protected]>
GitHub-issue: kubernetes-sigs#3078 Signed-off-by: Aviv Litman <[email protected]> Uncomment monitoring go files Signed-off-by: João Vilaça <[email protected]> Generate working monitoring scaffolding Signed-off-by: João Vilaça <[email protected]> Allow kubebuilder init with monitoring bundle Signed-off-by: João Vilaça <[email protected]> Add metrics register to main file Signed-off-by: João Vilaça <[email protected]> Update testdata Signed-off-by: João Vilaça <[email protected]> Improve comments Signed-off-by: Aviv Litman <[email protected]>
GitHub-issue: kubernetes-sigs#3078 Signed-off-by: Aviv Litman <[email protected]> Uncomment monitoring go files Signed-off-by: João Vilaça <[email protected]> Generate working monitoring scaffolding Signed-off-by: João Vilaça <[email protected]> Allow kubebuilder init with monitoring bundle Signed-off-by: João Vilaça <[email protected]> Add metrics register to main file Signed-off-by: João Vilaça <[email protected]> Update testdata Signed-off-by: João Vilaça <[email protected]> Improve comments Signed-off-by: Aviv Litman <[email protected]>
GitHub-issue: kubernetes-sigs#3078 Signed-off-by: Aviv Litman <[email protected]> Uncomment monitoring go files Signed-off-by: João Vilaça <[email protected]> Generate working monitoring scaffolding Signed-off-by: João Vilaça <[email protected]> Allow kubebuilder init with monitoring bundle Signed-off-by: João Vilaça <[email protected]> Add metrics register to main file Signed-off-by: João Vilaça <[email protected]> Update testdata Signed-off-by: João Vilaça <[email protected]> Improve comments Signed-off-by: Aviv Litman <[email protected]>
GitHub-issue: kubernetes-sigs#3078 Signed-off-by: Aviv Litman <[email protected]> Uncomment monitoring go files Signed-off-by: João Vilaça <[email protected]> Generate working monitoring scaffolding Signed-off-by: João Vilaça <[email protected]> Allow kubebuilder init with monitoring bundle Signed-off-by: João Vilaça <[email protected]> Add metrics register to main file Signed-off-by: João Vilaça <[email protected]> Update testdata Signed-off-by: João Vilaça <[email protected]> Improve comments Signed-off-by: Aviv Litman <[email protected]>
GitHub-issue: kubernetes-sigs#3078 Signed-off-by: Aviv Litman <[email protected]> Uncomment monitoring go files Signed-off-by: João Vilaça <[email protected]> Generate working monitoring scaffolding Signed-off-by: João Vilaça <[email protected]> Allow kubebuilder init with monitoring bundle Signed-off-by: João Vilaça <[email protected]> Add metrics register to main file Signed-off-by: João Vilaça <[email protected]> Update testdata Signed-off-by: João Vilaça <[email protected]> Improve comments Signed-off-by: Aviv Litman <[email protected]>
GitHub-issue: kubernetes-sigs#3078 Signed-off-by: Aviv Litman <[email protected]> Uncomment monitoring go files Signed-off-by: João Vilaça <[email protected]> Generate working monitoring scaffolding Signed-off-by: João Vilaça <[email protected]> Allow kubebuilder init with monitoring bundle Signed-off-by: João Vilaça <[email protected]> Add metrics register to main file Signed-off-by: João Vilaça <[email protected]> Update testdata Signed-off-by: João Vilaça <[email protected]> Improve comments Signed-off-by: Aviv Litman <[email protected]>
I have the same sense as @umangachapagain pointed out. As an operator author, I would assume to have a whole monitoring stacks solution for the first glance on the name When going over for details, it feels more like Talking about monitoring, I personally am thinking of metrics, logging, tracing data, that is beyond what this issue want to bring about. However, the idea overall looks good. Since populating customized metrics can be a common case, having a general approach to simplify metrics/rules implementation do save developers' time. In my own opinion, I would expect this approach to be flexibly imported in my project. Which means, my project might not necessarily to be a scaffold by Kubebulider. From there, would it be possible if we change it to be a library or a separate tools like |
@sradco We had a discussion regarding the monitoring plugin during the community meeting and a few insights were brought up by @camilamacedo86 and @Kavinjsir. I'm summarizing the list of concerns below:
I would like to add like to add some of my thoughts inline, but I also defer to @sradco and team for their point of view.
In my point of view, I consider the best practices doc generation to be useful from an operator author's persona in the following way:
But at the same time, if these best practices are going to be common and the same piece of code (like the following: https://github.com/kubernetes-sigs/kubebuilder/pull/3106/files#diff-3036867e08a7c0e5bfdcccb3b380a20f01d1665d23187eb1a13fdc8926d2c8b2) is being generated for all projects, then they should specifically go into a library. What in the plugin would be helpful to users is: https://github.com/kubernetes-sigs/kubebuilder/blob/28fe31ee240090c2009e5c0a6611d3f0b85407f1/testdata/project-v3-declarative-v1/monitoring/tools/metricsdocs.go - wherein I mention the custom metrics and have a standard documentation generated for it. Whether this warrants a separate plugin or just an additional script in a discussion. On the other hand the contents of (testdata/project-v3-declarative-v1/monitoring/metrics/metrics.go) in the PR could possibly be moved to a library.
With monitoring may seem misleading if we decide to move ahead with the plugin but I wouldn't mind a name with prometheus on it.
Kubebuilder is a generic scaffolding tool. But at the same time, plugins can be for specific use. I could scaffold a project with kubebuilder, and chain with a plugin to do any form of customization I want. For example, the "deploy-image" plugin scaffolds out the best practices for go operators, it cannot be used on top of Helm plugin unless we add additional customizations. I would not worry about the lack of language agnostic feature to this plugin, because eventually when this is moved to an external plugin it can just be considered as an extension for having best practices for go operator. Overall, I feel that we should break down the PR to understand its use case better. The common methods could go into a library, but having a custom doc generation mechanism which reads the custom metrics specified by the user and scaffolds out the best practices for the same does seem to be a good idea for a plugin (this may need changes to the current implementation). |
@varshaprasad96 Big thanks for bringing what we've discussed in weekly with details! 👍🏼 @sradco I would also like to add my thoughts to supplement the points above: For Q1
I'm also happy to see a documentation with introduction of custom metrics if existing. For Q2 For Q3
I'm good if such feature becomes Golang specific finally. In fact, I would imagine that the default metrics provided by controller-runtime may cover many use cases when monitoring on API reconciliation. Actually, many projects provides mixins to give these prometheus resources "off the shelf", for instance: So as an operator user, I might also be happy if the plugin/lib can bring me similar manifests. (Just one thought around monitoring...) |
Hi @sradco, The utils proposed seems to fit as a lib. Therefore, since it is very prometheus specific and not "K8s" context I think that it might be accepted under (Suggestion, it seems fit in the SDK samples) Operator-sdk has a best practice section (https://sdk.operatorframework.io/docs/best-practices/) and scaffold samples within code which is used in the tutorials. Also, you made a sample with utils and related "good practices for metrics" as part of your previous initiatives (https://github.com/operator-framework/operator-sdk/tree/master/testdata/go/v3/monitoring/memcached-operator). So, why not add this code there?
Conclusion IMO we should not accepted it in Kubebuilder (it does seem address value for the project and seems to be too prometheus specific which is out of the scope).
Also, I understand that the others thoughts go along with. |
Related to having util.go file in an external library, in my opinion, it is best to keep the util file within the project rather than moving it to an external library. This allows developers to easily modify and maintain the code according to their needs. It also makes it easier for them to understand how the code works and how it fits into the overall project. Currently, the util file only contains basic operations for creating and registering metrics. However, advanced developers may have additional requirements, such as the ability to pass additional parameters to the metrics opts or the use of metric vectors. |
I'd still like to see this as a plugin. External plugin system seems to be the right fit. Is there a centralized repo for community contributed plugins? @camilamacedo86 Operator SDK already scaffolds some resources specific to Prometheus. Do you know if SDK does it on it's own or gets it via KB? As you suggested, SDK seems like a right fit to integrate such a plugin. I envision this as an external KB plugin which is leveraged by SDK as a default. SDK already brings in ServiceMonitor. This plugin can add metrics and PrometheusRule to it and document best practices. |
Hi @umangachapagain,
Why? What advantages a plugin that mainly only scaffolds a README can bring to you vs the cost to keep it maintained?
The only specific prometheus thing scaffold in the projects are the rule to allow enable the default metrics because by default they are protected. No utils code in any form are scaffolded in the projects done by SDK or Kubebuilder because that fits to a lib or into a third-party tool that can be consumed by the projects usually. My suggestion was check if this code cannot be added into the sample/example https://github.com/operator-framework/operator-sdk/tree/master/testdata/go/v3/monitoring/memcached-operator (not default scaffolds or plugin) since there has other "good practices" regards this topic and it can be used into the SDK docs. |
I disagree. As I mentioned before, this plugin is meant for an easier onboarding to Prometheus monitoring. This PR is the initial work that includes the metrics part. It doesn't only include README but includes the example files that the developers will add their specific metrics to. This is a very important part that will greatly help operator developers, with adding new metrics. For alerts we will also add a second PR that adds the required files for adding the alert rules, recording rules, Prometheus unit tests, runbooks. We plan to complement this plugin with a best practices doc which includes metrics naming conventions, alerts labels, which are currently not documented in a central place and its like the wild west which makes it hard for end users to make good use of the monitoring stack. I agree that this maybe specific to Prometheus stack, but the Grafana plugin is specific to Grafana and its in Kubebuilder. I don't mind making this a Prometheus stack plugin. Prometheus is a highly used stack and it deserves the attention. The Grafana plugin can become part of it.
What we added to this operator will save a lot of development time, frustration and prevent pitfalls that will later cost in more time to fix.
We are not providing a lib. The files are the base for the operator developers to build on.
The example dashboard will not have the same affect. We want to give the developers that starting point that will lead them for easier and better implementation of the Prometheus stack monitoring. The memcached operator will be built on top of this plugin and add the actual metrics, alerts and so on. We still want to add that. |
HI @sradco, Thank you for your input. Could you attend the next community meeting? Let's first ensure that we are all on the same page here and that the utils/code generated by the proposed plugin should not be scaffolded in the project itself. Then, after moving this implementation to a lib the plugin will scaffold? (That is what we need to discuss here, right?)
Codes examples in Kubebuilder are only addressed via docs/tutorials. Therefore, following good practices and conventions shows a pre-requirement for any code done by the tool. In SDK, you have been working on adding content regards this initiative. (here) Why not ship these examples there?
The grafana plugin
So, what could be scaffolded by the monitoring plugin that is a useful and valid code for what is done by default? What valuable, useful code could be generated by it?(what it can do that after to do the scaffold I can go there and use the result without the need to add any extra code on top)
That is something that IMO could be very nice to have and would have +1 vote here. |
I will attend the meeting.
It is the same like https://github.com/kubernetes-sigs/kubebuilder/blob/master/testdata/project-v4-multigroup/apis/foo/v1/bar_types.go The developers only need to add their custom metrics/alerts/recording rules and so on to them.
This plugin doesn't aim to add specific metrics and alerts and other specific resources. |
Hi @sradco, In the Proposed PR, the code generated is lib/bin features. not CLI ones and should not be inside the "Operator" itself. The problem is not the directory being named "utils" but what it does. The code inside the utils should be centralized in lib and consumed by the projects. The binary also should not be built for the Operator itself. The bin should be provided outside and consumed by the projects. Could you please let me know:
The examples raised by you are not the same scenario. I tried to clarify in the following comments.
That results from the code generated when we create APIs (feature). Provide comments in the code to let users know how to use a feature or if its result shows completed acceptably. But creating a feature that provides only mainly comments and examples seems not.
It is a sample in SDK. Kubebuilder has no Memcached operator or plugin that generates a specific Memcached operator. The sample is generated automatically via a stack available in SDK for maintainability/development purposes and is not part of any CLI feature. Therefore, ihmo what you are trying to add here could fit well in the sample but not in a plugin/cli feature.
The Grafana plugin provides a feature where you can input the info about your custom metrics and let the tool (feature) generate the Grafana Dashboards for the input provided. The examples are NOT the feature. The comment with an example is to let the user knows how to use the feature.
Grafana plugins should not add metrics. Its purpose is generate the dashboards for what is exported and exist by default. On top of that, it has a feature that allows you to say, "here are my custom metrics" please generate "Garafana dashboards for them".
Sorry, but I could not follow this one. However, if you disagree with the grafana plugin or would like to change it, then that seems that the best way to address that is open an issue so we discuss what should be changed, etc on it specifically. Could you please open this one if that is the case? |
@camilamacedo86 I understand your point of view, and to summarize adding two important points here that are creating questions on whether this issue should be accepted or not. I'll add my answers to them inline: Question 1: Are the helpers scaffolded by the plugin valuable? What are the benefits of scaffolding this inside an operator project, can these be just helpers in a library. Answers inline: Question 1: By definition, plugins are extensions to KB cli, which address "specific" needs of the user. These needs can either be addressed by scaffolding or generating code that is helpful to the user, like the grafana plugin does, or by enforcing a best practice (like the deploy image plugin does). Either ways until a plugin "addresses the needs of the user" it is valuable. The monitoring plugin here, is trying to enforce best practices by directing the user on where or how to add metrics in their project following best practice. After digging up, I understand that "util.go" is not a library, but is a helper which an operator author should eventually have "inside their project". Let's take an example here: The method This is similar to the deploy-image plugin - without this everything works as expected, but the operator project does not follow a best practice. A user can have an unfiltered cache, their project would still work, but having cache selectors will enable the operator to be more efficient - which is one of the things that deploy-image does. Imo this should be considered as a plugin. Having the helpers in a library would add no value, its like creating wrapper around wrappers (prometheus APIs) which is of no use. The motive of this plugin is to enforce a best practice and having these in a library would defeat that purpose, since users would not even use the library which is created from this. Question 2: Do we accept this is a plugin in KB? I agree with @camilamacedo86's point of view in this aspect wherein the eventual effort of maintaining the plugin inside KB can be painful. Having a plugin as a part of KB has a long list of problems (which I wouldn't repeat here), but this is why Phase 2 was introduced. Having out-of-tree plugins solves question 2, where as KB community we need not be worried of maintaining the code base. It would be external, can be used with KB, but will have its own set of maintainers and release cycle. This issue was brought up in community meeting, however, the idea was to start this as a phase 1.5 and eventually move it to phase 2 plugin, since as KB maintainers we haven't done a good job of creating a step-by-step documentation on building phase 2 plugins. However, if @sradco and team are comfortable on making this phase 2, that would definitely be the best solution. Tl;dr; the first step is to answer whether we can consider this to be a plugin or not. (Imo we should for which I have provided a detailed explanation above). The next is if this should be phase 1.5 or a phase 2 plugin. If the authors are comfortable with making this external using phase 2, then definitely +1 for it! cc: @camilamacedo86 @sradco @machadovilaca @umangachapagain @Kavinjsir PS: Modifying Grafana plugin and/or integrating all the aspects of a monitoring stack into one are separate follow up issues for future that need separate discussion and we need not bring them up here in the first step (imho). |
I'm would also vote for
I'm wondering if there is some specific |
HI @varshaprasad96, Thank you for all your input. Following some comments inline Regards the question: Question 1: Are the helpers scaffolded by the plugin valuable? What are the benefits of scaffolding this inside an operator project, can these be just helpers in a library. See that the utils also have a binary to build the docs for the metrics. In the same way, and for reasons that we do not scaffold a code to build envtest bin inside of the project, I do not agree that is something that should be scaffolded by default. Therefore, I was also wondering why it is required to implement and build a binary. If that is a good practice, does it not exist for projects that already provide the docs for the metrics? Are not the metrics implement in go so godoc will not generate the docs for it? Conclusion I think we can all convey that we reached a consensus that this proposal does fit well for KB purposes and views and, unfortunately, cannot be accepted to be shipped within.
@sradco, please let us know if we can agree to close this one as deferred asap or if you want to wait until the next community meeting. |
@camilamacedo86 Just adding my follow up thoughts here.
The docs are not the only files scaffolded. The plugin brings in the code required to setup metrics as well (https://github.com/kubernetes-sigs/kubebuilder/pull/3106/files#diff-75224dd6164f9903e8acf72b976de2f46f97c451a3b784fcd1d20347a90ffcbaR29). If an operator author wants to add metrics using prometheus, then its upto them to use this plugin and scaffold the helpers and its documentation (its not by default). Envtest imo is a different case, where we use the binary (as is) to spin up the test cluster. The users of the operator author project need not know what binaries are downloaded by envtest, or what is the use of "setup-envtest", or even the use of each binary. But the users of these metrics need to know how it is registered as well as what each one of them do. In my mind, the documentation can be of two kinds:
I agree that docs need not be a part of the operator image (or the binary built for it), but can definitely be a part of the project, for which this plugin is helpful. Just the helpers used to register metrics can be in We could divide the PR into three portions, where the first two that have operator specific metric helpers are a part of plugin, and the generic helpers which are common to all are a part of library:
(please correct me if I'm wrong) Do we agree that this could be a plugin? just that it cannot be a phase 1.5 plugin and shipped along with KB. I'm more concerned on opinions on why this shouldn't be a plugin than whether it should be in KB (as phase 1.5) or an external one. If the latter is true (this can be plugin but external), then we can surely close this issue and encourage the authors to create issues specifically with respect to phase 2 as they start building on it. |
Hi @varshaprasad96, IHMO: Regards the specific propose of this plugin After we remove the testdata/project-v3-declarative-v1/monitoring/tools/metricsdocs.go which we all agree that does not seems a good fit to be scaffolded in the project (Not sure why the binary is indeed required, are not godocs doing that already? Have not projects that can do it? Could not it be addressed in another project ? ) The goal of the plugin still:
Then, let's think about the value:
So, in POV: The work required to maintain the proposed plugin does not pay off the value that it can returns. Now, about what plugins Kubebuilder could accept or not to be shipped with and maintained by the project? The historic context here, is that it was discussed in the past and we agreed that unfortunately we have not the effort required to accept all possible plugins and address all possible needs. Therefore, we need to keep in mind only accept plugins that can bring value for the common scenarios and what is done by default. We need target things that can be useful and helpful for 80% of use cases instead of very specific needs. Because all points described above, I do not think that it fits well in Kubebuilder (my vote is -1). Looking on how can we still helping anyone from the community that has appetizer to maintain solutions like that we have been working in the external plugins ( plugins phase 2 ). Therefore, if @sradco has the effort required and think that is valuable address this need via a plugin instead docs and examples then, I'd say that it shows a great fit for the external plugins. And we encourage everybody use Kubebuilder as lib to create their own plugins for specific use cases and needs. The code done so far is mainly the same, it will mainly only not live in the Kubuilder repo. PS.: I indeed would like to suggest we review the declarative plugin. It seems for me that this one could be in the declarative repo instead of kubebuilder. |
Hi @varshaprasad96 @sradco @Kavinjsir @everettraven, It seems that the recommendation will be addressed via examples and docs in operator-sdk side. If so, do we agree in close this one as the PR: #3106? |
@camilamacedo86 I think it is fine to close #3106 as per this comment I think closing this issue is fine unless we feel there needs to be more discussion about making it an external plugin. |
@camilamacedo86 I think we can close this for now. Having the documentation and examples is a good starting point. It can also be expended to include additional information about logs, tracing, events etc and we plan to help in adding additional information as we go. From my POV, adding the plugin would allow automations that would save development time and on boarding time, will allow code reusability, improve the code quality and would ultimately result in better user experience. |
What do you want to happen?
We would like to add a new monitoring plugin that will help operators developer with setting up Prometheus based monitoring for their operator, will provide them with best practices and tooling to shorten their time to get up to speed with monitoring requirements and help with standardizing the way monitoring is implemented in operators.
The proposed structure and content:
PR operator-framework/operator-sdk#5975 is also related and can be referenced to provide additional best practices. It is correctly on hold since we would like to replace the examples with the examples here.
Extra Labels
No response
The text was updated successfully, but these errors were encountered: