Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep directories when SIGINT sent to daemon #1526

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sondavidb
Copy link

@sondavidb sondavidb commented Jan 13, 2024

Fixes #1440

This change was done due to the general recommendation to use SIGTERM instead of SIGINT. This change will make SIGINT effectively behave the same as SIGTERM, but will properly unmount the FUSE mounts instead of leaving them hanging. This means that we can now kill the daemon with SIGINT and start it up again without any issues or workarounds.. However, this may not be desired behavior, so feel free to close the PR if this is undesirable.

This is identical to #881 in awslabs/soci-snapshotter.

I tested this with the following workflow:

sudo out/containerd-stargz-grpc &> a.log &
sudo nerdctl pull --snapshotter stargz ghcr.io/stargz-containers/drupal:9.3.9-esgz

I then checked a.log and confirmed that there were logs showing the image was being pulled with stargz. I also checked mount | grep fuse and saw the FUSE mounts present.

sudo killall -SIGINT containerd-stargz-grpc

I checked the logs again to see it was killed, and checked that mount | grep fuse had no FUSE mounts present.

sudo out/containerd-stargz-grpc &> a.log &

I then checked a.log and confirmed that all snapshots were remounted, and that mount | grep fuse was repopulated.

pendo324 added a commit to runfinch/finch that referenced this pull request Jan 29, 2024
…#767)

Issue #, if available: re-verified #412
- Through extensive e2e test debugging, I noticed that soci and stargz
snapshotters weren't persisting data as expected. After debugging, I
found some context in these two PRs:
  - awslabs/soci-snapshotter#881
  - containerd/stargz-snapshotter#1526
Unfortunately, neither of them are deployed yet, so I've implemented a
hacky workaround for now. After this change, an image/container can be
pull/run, the VM can be restarted, and then the container can be
re-started again.

*Description of changes:*
- Redo how BuildKit/Stargz/SOCI are related to containerd using
[systemd's `PartOf`

](https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html#PartOf=)
- this ensures that all of these services are restarted when containerd
is restarted, which the lack of has caused errors in the past
- Create some missing directories that might throw errors in cloud-init
- Ensure that `SIGTERM` is used to kill the snapshotter services for now

*Testing done:*
- manual testing



- [x] I've reviewed the guidance in CONTRIBUTING.md


#### License Acceptance

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Signed-off-by: Justin Alvarez <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Broken snapshotter state on SIGINT
1 participant