-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken snapshotter state on SIGINT #1440
Comments
For avoiding leaving empty snapshot directory that looks valid but the emptiness is actually not the correct contents as the snapshots. You can use non-SIGINT signals for avoiding this behaviour.
Thanks for the suggestion. So user still need some manual operation for keeping consistency between containerd & snapshotter metadata db? It would be great if we have docs about how to fix it manually. |
Thanks for the reply!
I understand, I'm more trying to understand if this is intentional behavior, or if this is something we would want to change.
Correct. Assuming we want to keep current behavior of removing the snapshots on cleanup, if we update the metadata when the daemon is killed with SIGINT, the daemon can start without changing For an explanation: This behavior is caused by the cleanup functions wiping the snapshots in the stargz content store and the change not being reflected in Based on your feedback, I'd be happy to open a PR with some changes. Currently I am looking at three options:
Please let me know which one would be the most preferred. Apologies for the long comment. Thanks again! |
Hi, just wanted to bump this issue again. From what I've gathered, we are suggesting people to use SIGTERM to kill the daemon to avoid this problematic behavior. This keeps the uncomrpessed snapshot files on disk, as well as keeps the mounts. I think it would be best to ship a fix where SIGINT will unmount the FUSE file systems but still keep snapshot files on disk. This would imitate already existing behavior, but have the added benefit of not having to manually remove FUSE mounts. I've created a PR at #1526 to implement this, but feel free to close/suggest modifications to this behavior. Thanks! |
When calling SIGINT on the daemon, it cannot be loaded into a working state without some extra effort. This is because SIGINT triggers the cleanup flag, which wipes the content store. However, the metadata is never synced to update this state. This leaves me with two questions.
First, looking through previous issues I found #703 (comment) made note of the snapshotter removing its snapshot directories on cleanup. Could I ask for an explanation for this behavior? I would imagine that one would not want to re-pull images after restarting the daemon.
Second, could this broken state be fixed by updating the metadata before line 662 in
Close()
in snapshot.go? I tried to do it myself but couldn't figure out a way to get the snapshot keys. However, since cleanup wipes the container store clean regardless, as a proof of concept, I calledos.Remove()
right before line 662, like so:This lets the daemon start in a mostly working state, but you must still manually remove and re-pull the image, as the image is still in the containerd content store. This could probably fixed by natively interacting with the db via the metadata package. Hence, this is a pretty inelegant solution IMO.
EDIT: I found that
Walk
gets all the keys, so calling that then callingRemove
on each of them removes them from the metadata. However, it still leaves it in the containerd content store, so we still have the same issue of being unable to re-pull the image unless you remove it manually and pull again.I can open a PR with these changes if people want this, as it removes the need to set
allow_invalid_mounts_on_restart=true
to start the daemon to remove the snapshot. It's still imperfect for the above reason but it would be a nice QOL change IMO.The text was updated successfully, but these errors were encountered: