Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] SOCI Removes Snapshots on SIGINT #832

Closed
sondavidb opened this issue Sep 12, 2023 · 1 comment
Closed

[Bug] SOCI Removes Snapshots on SIGINT #832

sondavidb opened this issue Sep 12, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@sondavidb
Copy link
Contributor

sondavidb commented Sep 12, 2023

Description

When stopping the daemon via a SIGINT, SOCI makes an effort to clean up after itself. This includes both unmounting the requisite directories and removing them entirely.

This leads to a fuse mount error upon restarting the snapshotter:

/usr/bin/fusermount: bad mount point /var/lib/soci-snapshotter-grpc/snapshotter/snapshots/2/fs: No such file or directory

The snapshotter then only successfully start up when setting allow_invalid_mounts_on_restart=true in the config. After this, pulling any other images results in an error similar to the one below:

failed to prepare extraction snapshot "extract-xxxxx": failed to stat parent: stat /var/lib/soci-snapshotter-grpc/snapshotter/snapshots/1/fs: no such file or directory: unknown

As I ended the task gracefully, I would expect the snapshotter to start successfully, as I still possess the original image and index, but this is not the case, and the snapshotter fails to start. I must enable allow_invalid_mounts_on_restart=true in the config.toml file to get it to start up again, and then have to manually remove the previously loaded snapshot to pull any other images. This is seemingly unintuitive, so if this is expected behavior, I would like an explanation as to why it behaves this way.

Upon inspection, the following behaviors have been observed on cleanup:

  • The uncompressed snapshots on disk are cleaned
  • The snapshots metadata seems to be unmodified

The combination of the two seems to clash with each other, as on startup, the snapshotter expects the previously loaded snapshots to be in the snapshots directory per its metadata, yet a graceful cleanup in this case means the snapshots are not there anymore, whereas a nongraceful kill will retain the snapshots, and thus the snapshotter starts up just fine — I can pull a separate image and run either snapshot just fine.

If removing the snapshot is intended behavior, that's fine, but we should then clean the metadata to reflect this on a graceful cleanup. I think we should mimic containerd's default behavior if we do not already have a clear direction in mind for this.

Steps to reproduce the bug

  1. Pull any image
  2. Kill the daemon gracefully (sudo killall -2 soci-snapshotter-grpc)
  3. Ensure allow_invalid_mounts_on_restart=true in /etc/soci-snapshotter-grpc/config.toml. (Without doing this, step 4 will fail to start, which is also unexpected behavior)
  4. Start snapshotter
  5. Attempt to pull another image that is not the image pulled in step 1

Describe the results you expected

Snapshotter starts with no errors and does not need the allow_invalid_mounts_on_restart flag to be true to start. I also expect to be able to pull additional images without any errors. Additionally, I expect /var/lib/soci-snapshotter-grpc/snapshotter to remain populated even when the daemon is not running.

Host information

  1. OS: Amazon Linux 2
  2. Snapshotter Version: v0.4.0
  3. Containerd Version: v1.7.3

Any additional context or information about the bug

No response

@sondavidb sondavidb added the bug Something isn't working label Sep 12, 2023
@github-project-automation github-project-automation bot moved this to ❓ Ungroomed in soci-snapshotter Sep 13, 2023
@Kern-- Kern-- moved this from ❓ Ungroomed to 📋 Backlog in soci-snapshotter Sep 13, 2023
@sondavidb
Copy link
Contributor Author

Closed by #881

@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in soci-snapshotter Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

1 participant