-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Velero fails to expose correct backup metrics after a pod restart #6936
Comments
you use the version is 1.11? |
I check the release 1.12 , the pr is not include in that. |
Do you mean the velero_backup_last_status metric should read the most recently completed backup before the Velero pod restarts to determine the metrics' value for schedule backup? The metrics is reset to 0 when Velero pod restarts because the default value of the metrics is 0. This has been changed in the PR: #6838 as @yanggangtony mentioned. |
Yes. A default value of one seems pointless to me. If a backup fails and then velero gets an restart, the metric reflects a wrong state. if have the personal feeling that using a default value as initial value results into unexpected behavior. |
in this issue issues/6809 , we observed when velero gets an restart, the schedule will continue a new cron runing. So the default value will be changed when it hits the error,and changed to value 0. And you suggest maybe want to not init the value of 'velero_backup_last_status' , but realtime calculate the most recently completed backup. This maybe get a opinion and discuss with maintaners , like @allenxu404 @sseago @ywk253100 |
Yes, I'm expecting the same behavior from |
Other metrics from velero expose non-default values after a pod restart. |
@mpryc FYI |
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands. |
not stale |
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands. |
I feel this is still relevant |
The This may have side effects which needs to be checked on the time of such event. When the velero restarts and the metric will re-read information around backups and it's states the time of such even will be the time of velero restart and not the actual backup. This does not apply to all the metrics, but metrics such as Another solution would be to not represent any metrics after restart and only show the ones which happens after restart. This will however require modifications on the query of the prometheus DB to gather information about past events. |
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands. |
For weekly backups, it also take a while until the status is correctly reported. |
We are hitting this issue as well where the pod gets restarted because we replace nodes on a regular cadenace. This causes backup metrics to be misreported. We have held off putting up alerts because of this. |
@vinayan3 we are using
|
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands. |
This isn't stale. |
unstalev2 |
What steps did you take and what happened:
The metric
velero_backup_last_status
exposes the status of the latest backups.Once a backup has been taken, the metric gets updated.
However, a pod restart in between any two scheduled backups resets the metric exposed by
velero_backup_last_status
.The metric only gets updated for backups after they are created.
What did you expect to happen:
Ideally, the metric should read the list of backups and set the
velero_backup_last_status
metric.So if a backup happens at 12:00 and the velero pod is restarted or killed at 12:30, the metric
should not be set to 0 (which indicates no backup has been taken).
The following information will help us better understand what's going on:
Environment:
velero version
): v1.11.0velero client config get features
):kubectl version
): v1.26.8/etc/os-release
): Ubuntu 20.04.5 LTSVote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
The text was updated successfully, but these errors were encountered: