Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup partially failed with csi plugin 0.6.0-rc2 on OVH cluster #6852

Open
Arcahub opened this issue Sep 21, 2023 · 15 comments
Open

Backup partially failed with csi plugin 0.6.0-rc2 on OVH cluster #6852

Arcahub opened this issue Sep 21, 2023 · 15 comments
Assignees

Comments

@Arcahub
Copy link

Arcahub commented Sep 21, 2023


name: Bug report
about: Using the velero 1.12.0 Data Movement feature on OVH managed cluster make backup partially failed while using matching csi plugin version v0.6.0-rc2 while it was working on v0.5.1.


What steps did you take and what happened:
I wantend to test the Data Movement feature.
I installed velero CLI v1.12.0-rc.2

velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.8.0-rc.2,velero/velero-plugin-for-csi:v0.6.0-rc.2 \
  --no-default-backup-location \
  --features=EnableCSI \
  --no-secret \
  --use-node-agent

# Create kubernetes secret with s3 credentials

# Create velero storage location
velero backup-location create --bucket "${OVH_CLOUD_PROJECT_SERVICE}-my-cluster-backup" --provider aws --config region=gra,s3ForcePathStyle="true",s3Url=https://s3.gra.io.cloud.ovh.net "my-cluster-backup" --credential "my-cluster-backup=cloud"

# Create velero snapshot location
velero snapshot-location create --provider aws --config region=gra,s3ForcePathStyle="true",s3Url=https://s3.gra.io.cloud.ovh.net "my-cluster-backup" --credential "my-cluster-backup=cloud"

# VolumeSnapshotClass for ovh

# Create the backup
velero backup create "my-cluster-backup-${uuid}" --snapshot-move-data --storage-location "my-cluster-backup" --volume-snapshot-locations "my-cluster-backup" --csi-snapshot-timeout 10m

The backup ended in a PartiallyFailed state with error for the majority of PVC: Fail to wait VolumeSnapshot snapshot handle created. Still some PVC was able to be backup while some didn't, so I am guessing it's realated to some timeout error.

What did you expect to happen:

I expected the backup to work in the rc version of the csi plugin since nothing else changed on the cluster except this version.

The following information will help us better understand what's going on:

The bundle extract from velero debug --backup:

bundle-2023-09-21-11-15-47.tar.gz

Anything else you would like to add:

I tried running a backup with the exact same install commands mentionned before but changing the plugins version of the csi plugin to v0.5.1:

velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.8.0-rc.2,velero/velero-plugin-for-csi:v0.5.1 \
  --no-default-backup-location \
  --features=EnableCSI \
  --no-secret \
  --use-node-agent

And it's worked without any error. Here is the debug bundle of the working backup with csi plugin in version v0.5.1.
bundle-2023-09-21-12-20-13.tar.gz

Of course even if it's worked it is missing the DataUpload part to achieve DataMovement so it is not what I am looking for.

Environment:

  • Velero version (use velero version):v1.12.0-rc.2 7112c62
  • Velero features (use velero client config get features):
  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:53:42Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.7", GitCommit:"84e1fc493a47446df2e155e70fca768d2653a398", GitTreeState:"clean", BuildDate:"2023-07-19T12:16:45Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes installer & version: OVH v1.26.7
  • Cloud provider or hardware configuration: OVH
  • OS (e.g. from /etc/os-release): - RuntimeOS: linux - RuntimeArch: amd64

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@Arcahub
Copy link
Author

Arcahub commented Sep 21, 2023

I took some time to debug while looking at the source code so here are my investigations if it can help in any way:

But if it is iterating two time on this loop it would mean that the first time it was able to successfuly get the VolumeSnapshot and reached the Waiting log line... At this point I don't have any more idea so I hope a maintainer can help me 👼 .

@Lyndon-Li
Copy link
Contributor

Lyndon-Li commented Sep 26, 2023

From below log, Velero CSI plugin indeed polled the VS twice. For the first time, it got the VS successfully, but failed for the second time:

time="2023-09-20T23:34:26Z" level=info msg="Waiting for CSI driver to reconcile volumesnapshot gitea/velero-gitea-shared-storage-55wxb. Retrying in 5s" Backup=bashroom-cluster-backup5 Operation ID=du-7ce94fa2-58d3-447c-85c8-edb2af97b58a.75d4847f-3ae0-43e426984 Source PVC=gitea/gitea-shared-storage VolumeSnapshot=gitea/velero-gitea-shared-storage-55wxb backup=velero/bashroom-cluster-backup5 cmd=/plugins/velero-plugin-for-csi logSource="/go/src/velero-plugin-for-csi/internal/util/util.go:244" pluginName=velero-plugin-for-csi
time="2023-09-20T23:34:31Z" level=error msg="Fail to wait VolumeSnapshot snapshot handle created: failed to get volumesnapshot gitea/velero-gitea-shared-storage-55wxb: volumesnapshots.snapshot.storage.k8s.io \"velero-gitea-shared-storage-55wxb\" not found" Backup=bashroom-cluster-backup5 Operation ID=du-7ce94fa2-58d3-447c-85c8-edb2af97b58a.75d4847f-3ae0-43e426984 Source PVC=gitea/gitea-shared-storage VolumeSnapshot=gitea/velero-gitea-shared-storage-55wxb backup=velero/bashroom-cluster-backup5 cmd=/plugins/velero-plugin-for-csi logSource="/go/src/velero-plugin-for-csi/internal/backup/pvc_action.go:191" pluginName=velero-plugin-for-csi

Perhaps the VS was deleted after the 1st time poll, but I don't know why. I searched the log, Velero didn't do it since the DataUpload request had not created yet, no data mover modules would touch the VS.

@Arcahub
Could you check the CSI driver and external snapshot provisioner log to see any clue about the deletion?

Additionally, could you also try CSI snapshot backup (without data movement) with Velero 1.12 + CSI plugin 0.6.0? You can run this by removing the --snapshot-move-data flag:
velero backup create "my-cluster-backup-${uuid}" --storage-location "my-cluster-backup" --volume-snapshot-locations "my-cluster-backup" --csi-snapshot-timeout 10m

CSI snapshot backup has somehow different workflows from CSI snapshot data movement backup, let's see it is a generic problem related to CSI snapshot or not.

@Arcahub
Copy link
Author

Arcahub commented Sep 28, 2023

Hello @Lyndon-Li, thank you for taking a look at my issue.
I can't answer you this week but on Monday I will try to take a look at the log and try CSI snapshot backup without movement and provide you my feedback.

I didn't mentionne it in my previous post but ovh csi driver is cinder if it can help somehow.

@Arcahub
Copy link
Author

Arcahub commented Oct 2, 2023

Hello @Lyndon-Li, I just tested running the backup without the data movement and it failed. The installation was the same and the command was also the same without the --snapshot-move-data so the breaking changes seems to in the csi snapshot. Here is the debug bundle

bundle-2023-10-02-12-33-45.tar.gz

As I said previously the csi driver is cinder on ovhcloud but I wasn't able to find any logs.

@blackpiglet
Copy link
Contributor

@Arcahub
Could you also try to do the same CSI backup with velero/velero:v1.11.1 and velero/velero-plugin-for-csi:v0.5.1?

There was some modification in how the VolumeSnapshot resources created during backup are handled.
The VolumeSnapshot resources created during backup should be cleaned because that can prevent the snapshots from deleting when the VolumeSnapshots are deleted or the VolumeSnapshots' namespace is deleted.

The change introduced in v1.12.0 is the VolumeSnapshot cleaning logic is moved into the CSI plugin. The benefit is the time-consuming multiple VolumeSnapshots handling is now handled concurrently.

It's possible that the v1.12.0 Velero and the v0.5.1 CSI plugin both don't have the VolumeSnapshot resources cleaning.
This is the CSI plugin and the Velero server's compatibility matrix.
https://github.com/vmware-tanzu/velero-plugin-for-csi/tree/main#compatibility

@Arcahub
Copy link
Author

Arcahub commented Oct 16, 2023

@blackpiglet

I already tested with velero/velero:v1.11.1 and velero/velero-plugin-for-csi:v0.5.1 back when I was implementing velero in my cluster. I re-tested and the backup was successful. I didn't test the restore but I saw the csi snapshot in ovh web interface. Here is the bundle in case it can help.
bundle-2023-10-16-15-23-15.tar.gz

I am currently using the file-system backup since the data movement is an essential feature in my case and that is why I am experimenting the csi data movement since I would rather prefer this strategy.

I also tested with the official 1.20.0 release of velero and the velero-plugin-for-csi:v0.6.1 just in case the release fixed something related but sadly it is still failing. Again here is the bundle in case it can help
bundle-2023-10-16-15-52-41.tar.gz

@blackpiglet
Copy link
Contributor

blackpiglet commented Oct 17, 2023

@Arcahub
Thanks for detailed information.
I couldn't find any information other than VolumeSnapshot not found from the partially failed backup.

But I found some things from the succeed backup.
First, the version doesn't seem right there.

Client:
	Version: v1.11.1
	Git commit: bdbe7eb242b0f64d5b04a7fea86d1edbb3a3587c
Server:
	Version: v1.12.0-rc.2
# WARNING: the client version does not match the server version. Please update client

The client version is right, but the server's version is still v1.12.0.

The images used are:

  • velero/velero-plugin-for-csi:v0.5.1
  • velero/velero-plugin-for-aws:v1.8.0-rc.2
  • velero/velero:v1.12.0-rc.2

Second, although the backup finished with completed, but no PVs' data is backed up.

Velero-Native Snapshots: <none included>

Could you please use the v1.11.x version of Velero CLI to reinstall the Velero environment? Please uninstall the Velero environment with velero uninstall command first.

To debug further, could you also check the CSI snapshotter pods' log to find whether there is some information about why the VolumeSnapshots deleted?

@Arcahub
Copy link
Author

Arcahub commented Oct 17, 2023

@blackpiglet

I am sorry for my mistake, I was using alias to switch between version but they were not expanded in my bash scripts. Here is the bundle of the test with velero/velero:v1.11.1 and velero/velero-plugin-for-csi:v0.5.1.
bundle-2023-10-17-11-14-09.tar.gz

The Velero-Native Snapshots field you mentionned is still empty but I can assure you that the snapshot are appearing on ovh interface like this:
image

I am 100% sure that those snapshot are created and managed by velero since there is no other snapshot mecanisme currently enable on this cluster and when I delete the backup the snapshot are also deleted.

Sadly as I said before, I am not able to provide cinder csi pods log since I just can access them.
When I run kubectl get pods -A -o name on my cluster with root kubeconfig here is the output:

Click me

Pods list

pod/argo-server-79d445949-6nwsf
pod/workflow-controller-55bd57fb6d-pngn8
pod/argocd-application-controller-0
pod/argocd-applicationset-controller-7c9cb6785d-hjd4g
pod/argocd-dex-server-69dbdcbf7d-zzdjj
pod/argocd-notifications-controller-f9d4457df-tttlz
pod/argocd-redis-ha-haproxy-7d7c895d48-7rqrg
pod/argocd-redis-ha-haproxy-7d7c895d48-9lvlj
pod/argocd-redis-ha-haproxy-7d7c895d48-frmxn
pod/argocd-redis-ha-server-0
pod/argocd-redis-ha-server-1
pod/argocd-redis-ha-server-2
pod/argocd-repo-server-774ffb985d-25778
pod/argocd-repo-server-774ffb985d-fk4nf
pod/argocd-server-65c96f7d86-dfj6s
pod/argocd-server-65c96f7d86-kpw2z
pod/website-7df55575f8-zcdt7
pod/camel-k-operator-7d66896b75-s5b8c
pod/cert-manager-6ffb79dfdb-sqp7d
pod/cert-manager-cainjector-5fcd49c96-fkffb
pod/cert-manager-webhook-796ff7697b-8f6fl
pod/cert-manager-webhook-ovh-65648fd49-xzrfb
pod/emqx-operator-controller-manager-697f499bb7-kmgzj
pod/gitea-79f968f68c-zgrkt
pod/gitea-postgresql-ha-pgpool-5b967d985f-ht48w
pod/gitea-postgresql-ha-postgresql-0
pod/gitea-postgresql-ha-postgresql-1
pod/gitea-postgresql-ha-postgresql-2
pod/gitea-redis-cluster-0
pod/gitea-redis-cluster-1
pod/gitea-redis-cluster-2
pod/gitea-redis-cluster-3
pod/gitea-redis-cluster-4
pod/gitea-redis-cluster-5
pod/gdrive-files-processing-wip-to-process-69d4bf86b-crv4w
pod/kafka-cluster-entity-operator-59654f75cb-qbjf2
pod/kafka-cluster-kafka-0
pod/kafka-cluster-kafka-1
pod/kafka-cluster-zookeeper-0
pod/strimzi-cluster-operator-695878cfc8-mj7d2
pod/calico-kube-controllers-65b74d475d-jqzl9
pod/canal-c8k4d
pod/canal-cbr9f
pod/canal-xz9t5
pod/coredns-545567dbbc-qvmtq
pod/coredns-545567dbbc-r5nz2
pod/kube-dns-autoscaler-7d57686cf5-vn6sc
pod/kube-proxy-4gbnk
pod/kube-proxy-5ljwh
pod/kube-proxy-hcsjr
pod/metrics-server-59bc47dc74-dw6wd
pod/secrets-store-csi-driver-4tclt
pod/secrets-store-csi-driver-d8wcb
pod/secrets-store-csi-driver-tlr9h
pod/wormhole-7c2k5
pod/wormhole-f988b
pod/wormhole-nrlh4
pod/alertmanager-kube-prometheus-kube-prome-alertmanager-0
pod/kube-prometheus-grafana-747559ff98-mxlkl
pod/kube-prometheus-kube-prome-operator-698dccc59-68qnj
pod/kube-prometheus-kube-state-metrics-cc66d7d4c-894sp
pod/kube-prometheus-prometheus-node-exporter-4x5bp
pod/kube-prometheus-prometheus-node-exporter-cbzhj
pod/kube-prometheus-prometheus-node-exporter-t9thr
pod/prometheus-kube-prometheus-kube-prome-prometheus-0
pod/nginx-ingress-controller-847c4bbdd-6mtj8
pod/keycloak-operator-6b9cf65f87-7x6r2
pod/sso-0
pod/sso-1
pod/sso-2
pod/sso-db-postgresql-ha-pgpool-5444f46c7d-tcxhs
pod/sso-db-postgresql-ha-postgresql-0
pod/sso-db-postgresql-ha-postgresql-1
pod/sso-db-postgresql-ha-postgresql-2
pod/vault-0
pod/vault-1
pod/vault-2
pod/vault-agent-injector-57db6b66cf-gvmzq
pod/vault-csi-provider-4zvdv
pod/vault-csi-provider-cts95
pod/vault-csi-provider-wmxcb
pod/node-agent-7vn95
pod/node-agent-c9brp
pod/node-agent-gccqv
pod/velero-64bdb44f88-8rdr8

OVH might not be managing the csi through pods or just hiding them from the users but I am not able to provide any logs since I don't have access to them. I would totally agree that it would help to debug this issue and at least I can try to contact the support to ask for the logs.

Just in case I rerun with the official lastest release 1.12.0 since I had done the same mistake by not changing the version. It ended with the same PartialyFailed as before
bundle-2023-10-17-11-55-03.tar.gz

@blackpiglet
Copy link
Contributor

blackpiglet commented Oct 18, 2023

Thanks for the feed back.
I found there was a pattern for the snapshot data moved PVCs.
Those three PVCs created in namespace sso succeeded, and their StorageClass is csi-cinder-high-speed.
The failed PVC's StorageClass is csi-cinder-classic.

Could you check the other failed PVCs' StorageClass setting?
And what's the difference of the storage backend of those two StorageClasses?

Backup Item Operations:
  Operation for persistentvolumeclaims gitea/redis-data-gitea-redis-cluster-5:
    Backup Item Action Plugin:  velero.io/csi-pvc-backupper
    Operation ID:               du-d1abf0dc-9873-42bc-9659-399d470fdd94.95bfeed4-7089-435508b50
    Items to Update:
                           datauploads.velero.io velero/bashroom-cluster-backup9-949tv
    Phase:                 Failed
    Operation Error:       error to expose snapshot: error to get volume snapshot content: error getting volume snapshot content from API: volumesnapshotcontents.snapshot.storage.k8s.io "snapcontent-bab583b4-02cb-4b44-a6ec-5d14fb2f9300" not found
    Progress description:  Failed
    Created:               2023-10-17 11:53:13 +0200 CEST
    Started:               2023-10-17 11:53:13 +0200 CEST
    Updated:               2023-10-17 11:53:13 +0200 CEST
  Operation for persistentvolumeclaims sso/data-sso-db-postgresql-ha-postgresql-0:
    Backup Item Action Plugin:  velero.io/csi-pvc-backupper
    Operation ID:               du-d1abf0dc-9873-42bc-9659-399d470fdd94.74bb2aa0-9b19-4afff55dc
    Items to Update:
                           datauploads.velero.io velero/bashroom-cluster-backup9-lcfzs
    Phase:                 Completed
    Progress:              228711229 of 228711229 complete (Bytes)
    Progress description:  Completed
    Created:               2023-10-17 11:53:35 +0200 CEST
    Started:               2023-10-17 11:53:35 +0200 CEST
    Updated:               2023-10-17 11:54:16 +0200 CEST
  Operation for persistentvolumeclaims sso/data-sso-db-postgresql-ha-postgresql-1:
    Backup Item Action Plugin:  velero.io/csi-pvc-backupper
    Operation ID:               du-d1abf0dc-9873-42bc-9659-399d470fdd94.5cae76ff-431d-4ee4bc856
    Items to Update:
                           datauploads.velero.io velero/bashroom-cluster-backup9-fvgpr
    Phase:                 Completed
    Progress:              77841062 of 77841062 complete (Bytes)
    Progress description:  Completed
    Created:               2023-10-17 11:53:40 +0200 CEST
    Started:               2023-10-17 11:53:40 +0200 CEST
    Updated:               2023-10-17 11:54:17 +0200 CEST
  Operation for persistentvolumeclaims sso/data-sso-db-postgresql-ha-postgresql-2:
    Backup Item Action Plugin:  velero.io/csi-pvc-backupper
    Operation ID:               du-d1abf0dc-9873-42bc-9659-399d470fdd94.2ea47ffd-8e3c-4cb9a7068
    Items to Update:
                           datauploads.velero.io velero/bashroom-cluster-backup9-q76n9
    Phase:                 Completed
    Progress:              144949922 of 144949922 complete (Bytes)
    Progress description:  Completed
    Created:               2023-10-17 11:53:45 +0200 CEST
    Started:               2023-10-17 11:53:45 +0200 CEST
    Updated:               2023-10-17 11:54:25 +0200 CEST

@Arcahub
Copy link
Author

Arcahub commented Oct 25, 2023

@blackpiglet
Sorry for the late reply,

The csi-cinder-high-speed is the default storageclass on ovh cluster, the only difference is that it is based on ssd storage instead of hdd for faster io operations. We are mostly using thecsi-cinder-classic and some case we have csi-cinder-high-speed in case of loosy configuration of storageclass or voluntary use of this one.

Here is the list of pvc in the cluster:

Click me

PVC list

NAME                                     STORAGECLASS
data-gitea-postgresql-ha-postgresql-0    csi-cinder-classic
data-gitea-postgresql-ha-postgresql-1    csi-cinder-classic
data-gitea-postgresql-ha-postgresql-2    csi-cinder-classic
gitea-shared-storage                     csi-cinder-high-speed
redis-data-gitea-redis-cluster-0         csi-cinder-classic
redis-data-gitea-redis-cluster-1         csi-cinder-classic
redis-data-gitea-redis-cluster-2         csi-cinder-classic
redis-data-gitea-redis-cluster-3         csi-cinder-classic
redis-data-gitea-redis-cluster-4         csi-cinder-classic
redis-data-gitea-redis-cluster-5         csi-cinder-classic
data-0-kafka-cluster-kafka-0             csi-cinder-high-speed
data-0-kafka-cluster-kafka-1             csi-cinder-high-speed
data-kafka-cluster-zookeeper-0           csi-cinder-high-speed
data-sso-db-postgresql-ha-postgresql-0   csi-cinder-high-speed
data-sso-db-postgresql-ha-postgresql-1   csi-cinder-high-speed
data-sso-db-postgresql-ha-postgresql-2   csi-cinder-high-speed
audit-vault-0                            csi-cinder-classic
audit-vault-1                            csi-cinder-classic
audit-vault-2                            csi-cinder-classic
data-vault-0                             csi-cinder-classic
data-vault-1                             csi-cinder-classic
data-vault-2                             csi-cinder-classic

My interpretation is that the error we are facing is somehow a latency error or at least a time related error and high speed pvc are more likely to complete or be reachable at the moment velero make the API call but still we can see that all high speed are not successful.

I checked others bundle I uploaded before in this issue and I was able to find other pvc that succeded but they were not always using csi-cinder-high-speed

@blackpiglet
Copy link
Contributor

Thanks.
I agree that using a high-speed disk doesn't mean the snapshot creation will succeed.
I think we need more information from the CSI driver and snapshot controller to learn why the VolumeSnapshots are deleted.

@Arcahub
Copy link
Author

Arcahub commented Oct 25, 2023

Yeah I do agree on that. I have created a ticket on ovh support to ask for access to csi driver logs and some help on this issue from their side. I am waiting for an answers from them and will keep you updated.

I also have an openstack install on premise on my side so I will try to install a Kubernetes cluster with my own cinder csi driver to test if it is an issue only related to ovh or on overall cinder csi driver

@Lyndon-Li Lyndon-Li added the Needs info Waiting for information label Nov 16, 2023
@Lyndon-Li
Copy link
Contributor

@Arcahub
See issue #7068, though the current problem is different from that one, but we can troubleshoot it in the same way --- collect the snapshot controller pods' log(there are many containers in the snapshot controller pods, need to collect log for each container) before & after the problem happens, and from the log we will be able to know who deleted the VS for sure.

I think you may not need to contract the CSI driver vendor, because the snapshot controller is a Kubernetes upstream module and the pods should be in kube-system namespace.

@MrOffline77
Copy link

I'm running on OVH too with the same behavior as far as I understood this so far.
Anyway, I do have access to the CSI Driver Logs. At least I think its the correct spot from which you requested the logs.

  • K8S 1.27.12
  • Velero 1.13.1
  • velero-plugin-for-aws:v1.9.0
  • velero-plugin-for-csi:v0.7.1

On each K8S Node runs a container like this
registry.kubernatine.ovh/public/cinder-csi-plugin-amd64:192 within a extra containerd namespace.
Below you can find the logs of one container as an example. The other ones look the same when the backup starts.

The Log below starts together with the Velero Backup.

I0515 13:00:21.637381       9 utils.go:81] GRPC call: /csi.v1.Node/NodeGetCapabilities
I0515 13:00:21.662108       9 utils.go:81] GRPC call: /csi.v1.Node/NodeGetCapabilities
I0515 13:00:21.664284       9 utils.go:81] GRPC call: /csi.v1.Node/NodeGetCapabilities
I0515 13:00:21.666238       9 utils.go:81] GRPC call: /csi.v1.Node/NodeStageVolume
I0515 13:00:21.666258       9 nodeserver.go:352] NodeStageVolume: called with args {"publish_context":{"DevicePath":"/dev/sdd"},"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/cinder.csi.openstack.org/1506deb58c4ea03b0a8c329ac69367dc7252a798cca88c9048bbb936f2c3a55c/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"storage.kubernetes.io/csiProvisionerIdentity":"1715775916462-7155-cinder.csi.openstack.org"},"volume_id":"e3ca84b8-e6d5-46de-b1d9-9253df4ab2ad"}
I0515 13:00:22.333662       9 mount.go:171] Found disk attached as "scsi-0QEMU_QEMU_HARDDISK_e3ca84b8-e6d5-46de-b1d9-9253df4ab2ad"; full devicepath: /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_e3ca84b8-e6d5-46de-b1d9-9253df4ab2ad
I0515 13:00:22.333740       9 mount_linux.go:446] Attempting to determine if disk "/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_e3ca84b8-e6d5-46de-b1d9-9253df4ab2ad" is formatted using blkid with args: ([-p -s TYPE -s PTTYPE -o export /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_e3ca84b8-e6d5-46de-b1d9-9253df4ab2ad])
I0515 13:00:22.342622       9 mount_linux.go:449] Output: "DEVNAME=/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_e3ca84b8-e6d5-46de-b1d9-9253df4ab2ad\nTYPE=ext4\n"
I0515 13:00:22.342648       9 mount_linux.go:340] Checking for issues with fsck on disk: /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_e3ca84b8-e6d5-46de-b1d9-9253df4ab2ad
I0515 13:00:22.535385       9 mount_linux.go:436] Attempting to mount disk /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_e3ca84b8-e6d5-46de-b1d9-9253df4ab2ad in ext4 format at /var/lib/kubelet/plugins/kubernetes.io/csi/cinder.csi.openstack.org/1506deb58c4ea03b0a8c329ac69367dc7252a798cca88c9048bbb936f2c3a55c/globalmount
I0515 13:00:22.535449       9 mount_linux.go:175] Mounting cmd (mount) with arguments (-t ext4 -o defaults /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_e3ca84b8-e6d5-46de-b1d9-9253df4ab2ad /var/lib/kubelet/plugins/kubernetes.io/csi/cinder.csi.openstack.org/1506deb58c4ea03b0a8c329ac69367dc7252a798cca88c9048bbb936f2c3a55c/globalmount)
I0515 13:00:22.557128       9 utils.go:81] GRPC call: /csi.v1.Node/NodeGetCapabilities
I0515 13:00:22.574137       9 utils.go:81] GRPC call: /csi.v1.Node/NodeGetCapabilities
I0515 13:00:22.575351       9 utils.go:81] GRPC call: /csi.v1.Node/NodeGetCapabilities
I0515 13:00:22.576431       9 utils.go:81] GRPC call: /csi.v1.Node/NodePublishVolume
I0515 13:00:22.576458       9 nodeserver.go:51] NodePublishVolume: called with args {"publish_context":{"DevicePath":"/dev/sdd"},"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/cinder.csi.openstack.org/1506deb58c4ea03b0a8c329ac69367dc7252a798cca88c9048bbb936f2c3a55c/globalmount","target_path":"/var/lib/kubelet/pods/1e210d86-15ce-4eee-9132-e237bb237ac0/volumes/kubernetes.io~csi/ovh-managed-kubernetes-8o7qqc-pvc-10a49715-f6f7-4580-95f8-7b9b53b2849a/mount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"nightly-20240515125933-m8dzp","csi.storage.k8s.io/pod.namespace":"velero","csi.storage.k8s.io/pod.uid":"1e210d86-15ce-4eee-9132-e237bb237ac0","csi.storage.k8s.io/serviceAccount.name":"velero","storage.kubernetes.io/csiProvisionerIdentity":"1715775916462-7155-cinder.csi.openstack.org"},"volume_id":"e3ca84b8-e6d5-46de-b1d9-9253df4ab2ad"}
I0515 13:00:22.698135       9 mount_linux.go:175] Mounting cmd (mount) with arguments (-t ext4 -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/cinder.csi.openstack.org/1506deb58c4ea03b0a8c329ac69367dc7252a798cca88c9048bbb936f2c3a55c/globalmount /var/lib/kubelet/pods/1e210d86-15ce-4eee-9132-e237bb237ac0/volumes/kubernetes.io~csi/ovh-managed-kubernetes-8o7qqc-pvc-10a49715-f6f7-4580-95f8-7b9b53b2849a/mount)
I0515 13:00:22.702164       9 mount_linux.go:175] Mounting cmd (mount) with arguments (-t ext4 -o bind,remount,rw /var/lib/kubelet/plugins/kubernetes.io/csi/cinder.csi.openstack.org/1506deb58c4ea03b0a8c329ac69367dc7252a798cca88c9048bbb936f2c3a55c/globalmount /var/lib/kubelet/pods/1e210d86-15ce-4eee-9132-e237bb237ac0/volumes/kubernetes.io~csi/ovh-managed-kubernetes-8o7qqc-pvc-10a49715-f6f7-4580-95f8-7b9b53b2849a/mount)
I0515 13:00:25.378530       9 utils.go:81] GRPC call: /csi.v1.Node/NodeGetCapabilities
I0515 13:00:25.381054       9 utils.go:81] GRPC call: /csi.v1.Node/NodeGetVolumeStats
I0515 13:00:25.381068       9 nodeserver.go:478] NodeGetVolumeStats: called with args {"volume_id":"8e9c13ad-01ae-41bd-b37f-244177f2d894","volume_path":"/var/lib/kubelet/pods/bf847b6a-415c-4c9b-b272-e3f60261041d/volumes/kubernetes.io~csi/ovh-managed-kubernetes-8o7qqc-pvc-064bf59e-48fc-4ab9-929e-f269e3013183/mount"}
I0515 13:00:33.363713       9 utils.go:81] GRPC call: /csi.v1.Node/NodeGetCapabilities
I0515 13:00:33.369426       9 utils.go:81] GRPC call: /csi.v1.Node/NodeGetVolumeStats
I0515 13:00:33.369441       9 nodeserver.go:478] NodeGetVolumeStats: called with args {"volume_id":"8dcfb96c-9d91-4b7e-bf15-2cbff72fd399","volume_path":"/var/lib/kubelet/pods/fe1b03a7-a4fa-4ff8-a9bb-8cd809ddc46e/volumes/kubernetes.io~csi/ovh-managed-kubernetes-8o7qqc-pvc-82fd4660-1d71-4aac-b026-8880a5abc3ff/mount"}
I0515 13:00:46.774014       9 utils.go:81] GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0515 13:00:46.774055       9 nodeserver.go:269] NodeUnPublishVolume: called with args {"target_path":"/var/lib/kubelet/pods/1e210d86-15ce-4eee-9132-e237bb237ac0/volumes/kubernetes.io~csi/ovh-managed-kubernetes-8o7qqc-pvc-10a49715-f6f7-4580-95f8-7b9b53b2849a/mount","volume_id":"e3ca84b8-e6d5-46de-b1d9-9253df4ab2ad"}
I0515 13:00:47.068446       9 mount_helper_common.go:99] "/var/lib/kubelet/pods/1e210d86-15ce-4eee-9132-e237bb237ac0/volumes/kubernetes.io~csi/ovh-managed-kubernetes-8o7qqc-pvc-10a49715-f6f7-4580-95f8-7b9b53b2849a/mount" is a mountpoint, unmounting
I0515 13:00:47.068479       9 mount_linux.go:266] Unmounting /var/lib/kubelet/pods/1e210d86-15ce-4eee-9132-e237bb237ac0/volumes/kubernetes.io~csi/ovh-managed-kubernetes-8o7qqc-pvc-10a49715-f6f7-4580-95f8-7b9b53b2849a/mount
W0515 13:00:47.073768       9 mount_helper_common.go:129] Warning: "/var/lib/kubelet/pods/1e210d86-15ce-4eee-9132-e237bb237ac0/volumes/kubernetes.io~csi/ovh-managed-kubernetes-8o7qqc-pvc-10a49715-f6f7-4580-95f8-7b9b53b2849a/mount" is not a mountpoint, deleting
I0515 13:00:47.177426       9 utils.go:81] GRPC call: /csi.v1.Node/NodeGetCapabilities
I0515 13:00:47.179256       9 utils.go:81] GRPC call: /csi.v1.Node/NodeUnstageVolume
I0515 13:00:47.179290       9 nodeserver.go:418] NodeUnstageVolume: called with args {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/cinder.csi.openstack.org/1506deb58c4ea03b0a8c329ac69367dc7252a798cca88c9048bbb936f2c3a55c/globalmount","volume_id":"e3ca84b8-e6d5-46de-b1d9-9253df4ab2ad"}
I0515 13:00:47.255762       9 mount_helper_common.go:99] "/var/lib/kubelet/plugins/kubernetes.io/csi/cinder.csi.openstack.org/1506deb58c4ea03b0a8c329ac69367dc7252a798cca88c9048bbb936f2c3a55c/globalmount" is a mountpoint, unmounting
I0515 13:00:47.255798       9 mount_linux.go:266] Unmounting /var/lib/kubelet/plugins/kubernetes.io/csi/cinder.csi.openstack.org/1506deb58c4ea03b0a8c329ac69367dc7252a798cca88c9048bbb936f2c3a55c/globalmount
W0515 13:00:47.324616       9 mount_helper_common.go:129] Warning: "/var/lib/kubelet/plugins/kubernetes.io/csi/cinder.csi.openstack.org/1506deb58c4ea03b0a8c329ac69367dc7252a798cca88c9048bbb936f2c3a55c/globalmount" is not a mountpoint, deleting
I0515 13:02:07.609636       9 utils.go:81] GRPC call: /csi.v1.Node/NodeGetCapabilities

Let me know if you need any further logs from me to assist.

@Lyndon-Li
Copy link
Contributor

@MrOffline77 Actually, we need the external-snapshotter log as mentioned in #7068, there are multiple containers including sidecar containers, we need the logs from all the containers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants