-
Notifications
You must be signed in to change notification settings - Fork 755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for pg_stat_replication.(write_lag|flush_lag|replay_lag) #1007
Comments
Also interested in this, it would be great to have a |
Just a note for anyone looking into this.
|
Good shout - fixed in original suggestion now |
The exported replication lag does not handle all failure modes, and can report 0 for replicas that are out of sync and incapable of recovery. A proper replacement for that metric would require a different approach (see e.g. prometheus-community#1007), but for a lot of folks, simply exporting the age of the last replay can provide a pretty strong signal for something being amiss. I think this solution might be preferrable to prometheus-community#977, though the lag metric needs to be fixed or abandoned eventually. Signed-off-by: Conrad Hoffmann <[email protected]>
The exported replication lag does not handle all failure modes, and can report 0 for replicas that are out of sync and incapable of recovery. A proper replacement for that metric would require a different approach (see e.g. prometheus-community#1007), but for a lot of folks, simply exporting the age of the last replay can provide a pretty strong signal for something being amiss. I think this solution might be preferable to prometheus-community#977, though the lag metric needs to be fixed or abandoned eventually. Signed-off-by: Conrad Hoffmann <[email protected]>
The exported replication lag does not handle all failure modes, and can report 0 for replicas that are out of sync and incapable of recovery. A proper replacement for that metric would require a different approach (see e.g. prometheus-community#1007), but for a lot of folks, simply exporting the age of the last replay can provide a pretty strong signal for something being amiss. I think this solution might be preferable to prometheus-community#977, though the lag metric needs to be fixed or abandoned eventually. Signed-off-by: Conrad Hoffmann <[email protected]>
The exported replication lag does not handle all failure modes, and can report 0 for replicas that are out of sync and incapable of recovery. A proper replacement for that metric would require a different approach (see e.g. prometheus-community#1007), but for a lot of folks, simply exporting the age of the last replay can provide a pretty strong signal for something being amiss. I think this solution might be preferable to prometheus-community#977, though the lag metric needs to be fixed or abandoned eventually. Signed-off-by: Conrad Hoffmann <[email protected]>
Add support for
pg_stat_replication.(write_lag|flush_lag|replay_lag)
Use case. Why is this important?
For simple primary:replica setups it would be convenient to be able to monitor standby lag in seconds from the primary.
pg_stat_replication
already includes this information on postgres>=10, but the exporter does not parse it. AFAIK, workarounds would includepg_stat_replication.write_lsn|flush_lsn|replay_lsn)
, but this does not quite capture issues where the replica has not replayed a business critical change however small in bytes.Notes
I'm assuming the exporter does not support parsing the
interval
data type, which is why these metrics are marked asDISCARD
here. I wonder if the following, explicit approach would be appropriate in this case:That way the deviation from the official fields is not in conflict with the source data.
The text was updated successfully, but these errors were encountered: