Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Influx-Capacitor service self monitoring #57

Open
marcelopetersen opened this issue Feb 26, 2016 · 5 comments
Open

Influx-Capacitor service self monitoring #57

marcelopetersen opened this issue Feb 26, 2016 · 5 comments

Comments

@marcelopetersen
Copy link

In my monitoring system, normally I check the state of a service to ensure that is running.
When monitor influx-capacitor service, even its running we have no guarantees that it is collecting data. For example, if connection to database is unavailable:

Log Name: Application
Source: Tharga.Toolkit.Console
Date: 26/02/2016 10:06:30
Event ID: 0
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: ComputerName
Description:
Could not establish a connection to the database.

We could have an option to configure an URL where the service will send current state at specific times, and possible error messages like database unavailable. It would work as a heartbeat monitoring.

Something like this:


<Influx-Capacitor>
  <HealthCheck Type="Nagios|OpsMgr|Custom" Enabled="true" SecondsInterval="60" SendErrorMessages="true">
    <MachineName>MyComputer</MachineName>
    <Url>http://mymonitor.com/AgentStatus</Url>
  </HealthCheck>
<Influx-Capacitor>
@poxet
Copy link
Owner

poxet commented Feb 27, 2016

You mean like a heartbeat?

I have also started to add log4net so that it would be possible to debug issues.

@marcelopetersen
Copy link
Author

Yes, like a heartbeat. Log4net is useful to local debug, but how to identify that service is having error to send data to database? If service is up, but cannot access the database, we have no option to identify this error. Nowadays, we must access the machine and search on event viewer.

@marcelopetersen
Copy link
Author

What do you think about, when the service cannot access the database, instead of we have monitoring, the service goes down? It will be easier to monitor because we just need to check service state.
My concern is about machine cannot send data and how to identify that problem (we have a lot of machines sending data).

@poxet
Copy link
Owner

poxet commented Feb 29, 2016

I think a heartbeat with information about the latest issues is a good idea. That would make it possible to monitor several machines in one place.

@nathanwebb
Copy link

This would tie in nicely with #29. The heartbeat could be as simple as a timestamp with status, sent to a central database.

Some scenarios:

  • If the database is down - don't worry about the agents, just fix the database ;)
  • If the agent can't reach the database: Since the central database would have the configuration for that agent as well, it would be easy to calculate when the next heartbeat should be sent. If the agent misses a heartbeat (+ some leeway), then you have an incident.
  • If some data is sent, but not all: Again, since the database contains the configuration, it should be able to see what is supposed to be sent. If everything look OK, but some data is sent, then this could still be identified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants