[Bug] SOCI Removes Snapshots on SIGINT #832

sondavidb · 2023-09-12T19:47:28Z

Description

When stopping the daemon via a SIGINT, SOCI makes an effort to clean up after itself. This includes both unmounting the requisite directories and removing them entirely.

This leads to a fuse mount error upon restarting the snapshotter:

/usr/bin/fusermount: bad mount point /var/lib/soci-snapshotter-grpc/snapshotter/snapshots/2/fs: No such file or directory

The snapshotter then only successfully start up when setting allow_invalid_mounts_on_restart=true in the config. After this, pulling any other images results in an error similar to the one below:

failed to prepare extraction snapshot "extract-xxxxx": failed to stat parent: stat /var/lib/soci-snapshotter-grpc/snapshotter/snapshots/1/fs: no such file or directory: unknown

As I ended the task gracefully, I would expect the snapshotter to start successfully, as I still possess the original image and index, but this is not the case, and the snapshotter fails to start. I must enable allow_invalid_mounts_on_restart=true in the config.toml file to get it to start up again, and then have to manually remove the previously loaded snapshot to pull any other images. This is seemingly unintuitive, so if this is expected behavior, I would like an explanation as to why it behaves this way.

Upon inspection, the following behaviors have been observed on cleanup:

The uncompressed snapshots on disk are cleaned
The snapshots metadata seems to be unmodified

The combination of the two seems to clash with each other, as on startup, the snapshotter expects the previously loaded snapshots to be in the snapshots directory per its metadata, yet a graceful cleanup in this case means the snapshots are not there anymore, whereas a nongraceful kill will retain the snapshots, and thus the snapshotter starts up just fine — I can pull a separate image and run either snapshot just fine.

If removing the snapshot is intended behavior, that's fine, but we should then clean the metadata to reflect this on a graceful cleanup. I think we should mimic containerd's default behavior if we do not already have a clear direction in mind for this.

Steps to reproduce the bug

Pull any image
Kill the daemon gracefully (sudo killall -2 soci-snapshotter-grpc)
Ensure allow_invalid_mounts_on_restart=true in /etc/soci-snapshotter-grpc/config.toml. (Without doing this, step 4 will fail to start, which is also unexpected behavior)
Start snapshotter
Attempt to pull another image that is not the image pulled in step 1

Describe the results you expected

Snapshotter starts with no errors and does not need the allow_invalid_mounts_on_restart flag to be true to start. I also expect to be able to pull additional images without any errors. Additionally, I expect /var/lib/soci-snapshotter-grpc/snapshotter to remain populated even when the daemon is not running.

Host information

OS: Amazon Linux 2
Snapshotter Version: v0.4.0
Containerd Version: v1.7.3

Any additional context or information about the bug

No response

The text was updated successfully, but these errors were encountered:

sondavidb · 2023-11-03T21:33:47Z

Closed by #881

sondavidb added the bug Something isn't working label Sep 12, 2023

Kern-- added this to soci-snapshotter Sep 13, 2023

github-project-automation bot moved this to ❓ Ungroomed in soci-snapshotter Sep 13, 2023

Kern-- moved this from ❓ Ungroomed to 📋 Backlog in soci-snapshotter Sep 13, 2023

sondavidb mentioned this issue Sep 15, 2023

Add more comprehensive debugging docs #835

Merged

sondavidb mentioned this issue Oct 20, 2023

Keep directories when SIGINT sent to daemon #881

Merged

sondavidb closed this as completed Nov 3, 2023

github-project-automation bot moved this from 📋 Backlog to ✅ Done in soci-snapshotter Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] SOCI Removes Snapshots on SIGINT #832

[Bug] SOCI Removes Snapshots on SIGINT #832

sondavidb commented Sep 12, 2023 •

edited

Loading

sondavidb commented Nov 3, 2023

[Bug] SOCI Removes Snapshots on SIGINT #832

[Bug] SOCI Removes Snapshots on SIGINT #832

Comments

sondavidb commented Sep 12, 2023 • edited Loading

Description

Steps to reproduce the bug

Describe the results you expected

Host information

Any additional context or information about the bug

sondavidb commented Nov 3, 2023

sondavidb commented Sep 12, 2023 •

edited

Loading