Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EKS: backups fail when running on Bottlerocket nodes #8360

Closed
pedro-m-andrade-alb opened this issue Oct 29, 2024 · 6 comments
Closed

EKS: backups fail when running on Bottlerocket nodes #8360

pedro-m-andrade-alb opened this issue Oct 29, 2024 · 6 comments

Comments

@pedro-m-andrade-alb
Copy link

pedro-m-andrade-alb commented Oct 29, 2024

What steps did you take and what happened:

Cluster backups fail when the nodes use a Bottlerocket AMI, but not when they use Amazon Linux 2 AMI.

velero create backup backup-test --snapshot-move-data --exclude-namespaces actions-runner,actions-runner-controller,cert-manager,default,kube-node-lease,kube-public,kube-system

What did you expect to happen:
Backups finish without errors.

The following information will help us better understand what's going on:
We have some clusters running with Bottlerocket nodes and others with Amazon Linux 2 nodes. All backup tests in the Bottlerocket clusters failed, using the same command and configuration as the AL2 clusters, where the backups finished without errors.

Furthermore, we changed some AL2 clusters into Bottlerocket and the backups started to fail.

Dataupload error:
error to initialize data path: error to boost backup repository connection default-ffwk-kopia: error to connect backup repo: error to connect repo with storage: error to connect to repository: repository not initialized in the provided storage

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help
bundle-2024-10-29-10-23-43.tar.gz

Anything else you would like to add:

Environment:

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@Lyndon-Li
Copy link
Contributor

The backups failed because:
error to initialize data path: error to boost backup repository connection default-ffwk-kopia: error to connect backup repo: error to connect repo with storage: error to connect to repository: repository not initialized in the provided storage

Looks like something wrong happens to the object store and the repo data was not there.
You can delete the backuprepository CRs and retry the backup.

@pedro-m-andrade-alb
Copy link
Author

We retried the backup with Bottlerocket nodes after deleting the backuprepository, and it failed with the error:
data path backup failed: Failed to run kopia backup: Failed to upload the kopia snapshot for si default@default:snapshot-data-upload-download/kopia/ffwk/grafana-pvc: permission denied

This test was done again in the same cluster but with AL2 nodes instead, and the backup finished successfully.

@kaovilai
Copy link
Member

kaovilai commented Oct 30, 2024

Probably related to #8249.

Please see fixes there and the new options you may need to apply.

@kaovilai
Copy link
Member

Bottlerocket has an always-enabled, enforced, restrictive SELinux policy for the mutable filesystem that helps prevent containers from executing dangerous operations, even when running as root.

src

@sseago
Copy link
Collaborator

sseago commented Oct 30, 2024

@kaovilai this is on 1.14 not 1.15, so I don't think the backupPVC issue is relevant. I think in this case, the --privileged-node-agent install flag may be all that's needed.

@pedro-m-andrade-alb
Copy link
Author

Adding the --privileged-node-agent flag fixed the problem, and the backups now work with Bottlerocket nodes. Thank you all for the help. The issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants