Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smf service state gauges #9

Open
Smithx10 opened this issue Oct 11, 2019 · 4 comments
Open

smf service state gauges #9

Smithx10 opened this issue Oct 11, 2019 · 4 comments

Comments

@Smithx10
Copy link

It would be helpful to expose gauges of smf service states from the GZ and Native SmartOS Zones. This would make it very simple to see over time in Grafana which services were down.

@khalfella
Copy link
Contributor

That is a good idea. I wonder what is the best way to implement these metrics though. One way I can think of is to have a single metric with labels for each service. We can encode the service status in the metric value. For example:

{
    "0": "running",
    "1": "disabled",
    "2": "maintenance",
     ....
}

To enhance the agent response time, we can query SMF directly instead of forking and execing svcs commands. That being said, I have two concerns with this approach:

  • An average compute node/headnode typically runs tens of instances. Many of these instances would have many services disabled by default (and not expected to ever be online at some point in the future). I am not sure if we need to export all this information?

  • In order to get the status of SMF services running inside a customer zone, the agent needs to exec a process inside that instance. I don't this is acceptable for a security point of view?

Another way to achieve this is to write a plugin to export all the necessary information, which might make more sense for on-prem customers.

@Smithx10
Copy link
Author

@khalfella,

I'd imagine we'd only be interested in services that were set to "enabled".

I don't believe we ever need to run anything inside the zone. Everything should be able to be gotten from the GZ. I believe the current cmon-agent is always ran in the GZ correct?

Thanks for the link to the plugin documentation.

@khalfella
Copy link
Contributor

@Smithx10 - That is correct, cmon-agent runs as a service in GZ.

Currently, all NGZ stats exported by cmon-agent are collected from the global zone. In fact, most of NGZ stats come directly from kstat, which is available in GZ for all the zones. However, the case for SMF is a little bit different. Since every zone runs its own instance of SMF, we might need to connect to each zone SMF instance in order to collect services status.

I believe SMF running inside NGZ is accessible via a door file, which is accessible from GZ, and that is how commands like svcs -Z work. So yes, I think you are right, we might not need to exec something inside the customer's zone. Still, we need to interact with SMF running inside NGZ.

@bahamat
Copy link
Contributor

bahamat commented Sep 25, 2020

@Smithx10 This has been implemented in triton-cmon-plugins.

It's currently undecided how much we should bring directly into cmon vs being in plugins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants