Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AppImage orphan survives main process exit #92

Closed
mehmetakbulut opened this issue Nov 27, 2024 · 14 comments · Fixed by #93
Closed

AppImage orphan survives main process exit #92

mehmetakbulut opened this issue Nov 27, 2024 · 14 comments · Fixed by #93

Comments

@mehmetakbulut
Copy link

I am seeing an orphaned process get left behind under some conditions after main process exits. This orphaned process seems to stay around forever in sleep state until reboot. I originally noticed this when appimage was run by a process running as a systemd service similar to #99 so I have put together the following example without systemd.

Running as root to demonstrate that it should not be a permissions issue.

Using following as entrypoint from appimage wrapper:

root@host:~# cat test.py
#!/usr/bin/env python3

print("Hello world!")

Running with strace to trace processes. We can see that main process forks pid 30812 (which I believe is this fork call) which then forks again with pid 30813. Main process then eventually execs our "Hello world!" script which exits properly. However we can see that a detached process is left behind which is pid 30813.

root@host:~# strace -f -e waitpid,fork,execve bazel-bin/test.AppImage
execve("bazel-bin/test.AppImage", ["bazel-bin/test.AppImage"], 0x7ffe197ee268 /* 21 vars */) = 0
fork()                                  = 30812
strace: Process 30812 attached
[pid 30812] fork()                      = 30813
strace: Process 30813 attached
[pid 30812] +++ exited with 0 +++
[pid 30811] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30812, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 30815 attached
[pid 30811] execve("/tmp/.mount_test.AALCfdi/AppRun", ["bazel-bin/test.AppImage"], 0x7ff6783cc440 /* 25 vars */) = 0
strace: Process 30816 attached
[pid 30816] execve("/usr/bin/dirname", ["dirname", "/tmp/.mount_test.AALCfdi/AppRun"], 0x558cc611ea28 /* 26 vars */) = 0
[pid 30816] +++ exited with 0 +++
[pid 30811] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30816, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 30817 attached
[pid 30817] execve("/usr/bin/dirname", ["dirname", "/tmp/.mount_test.AALCfdi/test.ru"...], 0x558cc611ea98 /* 26 vars */) = 0
[pid 30817] +++ exited with 0 +++
[pid 30811] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30817, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 30811] execve("./test", ["./test"], 0x558cc5034bc8 /* 28 vars */) = 0
[pid 30811] execve("/usr/local/sbin/python3", ["python3", "./test"], 0x7ffd39964728 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 30811] execve("/usr/local/bin/python3", ["python3", "./test"], 0x7ffd39964728 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 30811] execve("/usr/sbin/python3", ["python3", "./test"], 0x7ffd39964728 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 30811] execve("/usr/bin/python3", ["python3", "./test"], 0x7ffd39964728 /* 28 vars */) = 0
strace: Process 30818 attached
[pid 30818] execve("/usr/local/sbin/uname", ["uname", "-p"], 0x7ffee55c89b0 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 30818] execve("/usr/local/bin/uname", ["uname", "-p"], 0x7ffee55c89b0 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 30818] execve("/usr/sbin/uname", ["uname", "-p"], 0x7ffee55c89b0 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 30818] execve("/usr/bin/uname", ["uname", "-p"], 0x7ffee55c89b0 /* 28 vars */) = 0
[pid 30818] +++ exited with 0 +++
[pid 30811] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30818, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 30811] execve("/tmp/.mount_test.AALCfdi/test.runfiles/python_3_9_x86_64-unknown-linux-gnu/python_wrapper.sh", ["/tmp/.mount_test.AALCfdi/test.ru"..., "/tmp/.mount_test.AALCfdi/test.ru"...], 0x1afd4c0 /* 30 vars */) = 0
strace: Process 30819 attached
[pid 30819] execve("/usr/bin/dirname", ["dirname", "/tmp/.mount_test.AALCfdi/test.ru"...], 0x556d655cd510 /* 30 vars */) = 0
[pid 30819] +++ exited with 0 +++
[pid 30811] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30819, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 30820 attached
[pid 30820] execve("/tmp/.mount_test.AALCfdi/test.runfiles/python_3_9_x86_64-unknown-linux-gnu/bin/python3", ["/tmp/.mount_test.AALCfdi/test.ru"..., "-s", "/tmp/.mount_test.AALCfdi/test.ru"...], 0x556d655cd510 /* 30 vars */) = 0
Hello world!
[pid 30820] +++ exited with 0 +++
[pid 30811] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30820, si_uid=0, si_status=0, si_utime=2, si_stime=1} ---
[pid 30811] +++ exited with 0 +++
[pid 30815] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=30813, si_uid=0} ---
[pid 30815] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=30813, si_uid=0} ---
[pid 30815] +++ exited with 0 +++


^Cstrace: Process 30813 detached

30813 appears to be in sleep state while reading fuse device.

root@host:~# ps aux | grep 30813
root       30813  0.2  0.0   2632  1108 ?        Ss   11:25   0:00 bazel-bin/test.AppImage

root@host:~# cat /proc/30813/stack
[<0>] fuse_dev_do_read.isra.0+0x7f0/0x840
[<0>] fuse_dev_read+0x5f/0x90
[<0>] new_sync_read+0x110/0x1a0
[<0>] vfs_read+0xfe/0x190
[<0>] ksys_read+0x67/0xe0
[<0>] __x64_sys_read+0x1a/0x20
[<0>] do_syscall_64+0x38/0xc0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae

We can further confirm this. It is currently trying to read fd 5 which is /dev/fuse.

root@host:~# strace -p 30813
strace: Process 30813 attached
read(5, ^Cstrace: Process 30813 detached
 <detached ...>

root@host:~# ls -lah /proc/30813/fd/5
lrwx------ 1 root root 64 Nov 27 11:25 /proc/30813/fd/5 -> /dev/fuse

However if we run in an interactive environment (like most people would be doing by launching an application on their desktop or shell) using bash -i, then everything works properly with no detached processes left behind.

root@host:~# strace -f -e waitpid,fork,execve bash -i -c 'bazel-bin/test.AppImage'
execve("/usr/bin/bash", ["bash", "-i", "-c", "bazel-bin/test.AppImage"], 0x7ffe09a3c2d0 /* 21 vars */) = 0
strace: Process 31928 attached
[pid 31928] execve("/usr/bin/groups", ["groups"], 0x55e9349308c0 /* 21 vars */) = 0
[pid 31928] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31928, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 31929 attached
[pid 31929] execve("/usr/bin/lesspipe", ["lesspipe"], 0x55e934931170 /* 21 vars */) = 0
strace: Process 31930 attached
[pid 31930] execve("/usr/bin/basename", ["basename", "/usr/bin/lesspipe"], 0x56084be5a8b8 /* 21 vars */) = 0
[pid 31930] +++ exited with 0 +++
[pid 31929] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31930, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 31931 attached
strace: Process 31932 attached
[pid 31932] execve("/usr/bin/dirname", ["dirname", "/usr/bin/lesspipe"], 0x56084be65b28 /* 21 vars */) = 0
[pid 31932] +++ exited with 0 +++
[pid 31931] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31932, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 31931] +++ exited with 0 +++
[pid 31929] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31931, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 31929] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31929, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 31933 attached
[pid 31933] execve("/usr/bin/dircolors", ["dircolors", "-b"], 0x55e9349337c0 /* 21 vars */) = 0
[pid 31933] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31933, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
execve("bazel-bin/test.AppImage", ["bazel-bin/test.AppImage"], 0x55e9349337c0 /* 21 vars */) = 0
fork()                                  = 31934
strace: Process 31934 attached
[pid 31934] fork(strace: Process 31936 attached
)                      = 31936
[pid 31934] +++ exited with 0 +++
[pid 31927] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31934, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 31937 attached
[pid 31927] execve("/tmp/.mount_test.AcAadlB/AppRun", ["bazel-bin/test.AppImage"], 0x7ffb20309440 /* 25 vars */) = 0
strace: Process 31938 attached
[pid 31938] execve("/usr/bin/dirname", ["dirname", "/tmp/.mount_test.AcAadlB/AppRun"], 0x56524fd17a28 /* 26 vars */) = 0
[pid 31938] +++ exited with 0 +++
[pid 31927] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31938, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 31939 attached
[pid 31939] execve("/usr/bin/dirname", ["dirname", "/tmp/.mount_test.AcAadlB/test.ru"...], 0x56524fd17a98 /* 26 vars */) = 0
[pid 31939] +++ exited with 0 +++
[pid 31927] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31939, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 31927] execve("./test", ["./test"], 0x56524f1e7bc8 /* 28 vars */) = 0
[pid 31927] execve("/usr/local/sbin/python3", ["python3", "./test"], 0x7ffd6d5bf0a8 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 31927] execve("/usr/local/bin/python3", ["python3", "./test"], 0x7ffd6d5bf0a8 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 31927] execve("/usr/sbin/python3", ["python3", "./test"], 0x7ffd6d5bf0a8 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 31927] execve("/usr/bin/python3", ["python3", "./test"], 0x7ffd6d5bf0a8 /* 28 vars */) = 0
strace: Process 31940 attached
[pid 31940] execve("/usr/local/sbin/uname", ["uname", "-p"], 0x7ffdc13b4570 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 31940] execve("/usr/local/bin/uname", ["uname", "-p"], 0x7ffdc13b4570 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 31940] execve("/usr/sbin/uname", ["uname", "-p"], 0x7ffdc13b4570 /* 28 vars */) = -1 ENOENT (No such file or directory)
[pid 31940] execve("/usr/bin/uname", ["uname", "-p"], 0x7ffdc13b4570 /* 28 vars */) = 0
[pid 31940] +++ exited with 0 +++
[pid 31927] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31940, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 31927] execve("/tmp/.mount_test.AcAadlB/test.runfiles/python_3_9_x86_64-unknown-linux-gnu/python_wrapper.sh", ["/tmp/.mount_test.AcAadlB/test.ru"..., "/tmp/.mount_test.AcAadlB/test.ru"...], 0x1839a90 /* 30 vars */) = 0
strace: Process 31941 attached
[pid 31941] execve("/usr/bin/dirname", ["dirname", "/tmp/.mount_test.AcAadlB/test.ru"...], 0x561ca7958590 /* 30 vars */) = 0
[pid 31941] +++ exited with 0 +++
[pid 31927] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31941, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
strace: Process 31942 attached
[pid 31942] execve("/tmp/.mount_test.AcAadlB/test.runfiles/python_3_9_x86_64-unknown-linux-gnu/bin/python3", ["/tmp/.mount_test.AcAadlB/test.ru"..., "-s", "/tmp/.mount_test.AcAadlB/test.ru"...], 0x561ca7958590 /* 30 vars */) = 0
Hello world!
[pid 31942] +++ exited with 0 +++
[pid 31927] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31942, si_uid=0, si_status=0, si_utime=3, si_stime=1} ---
[pid 31927] +++ exited with 0 +++
[pid 31937] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=31936, si_uid=0} ---
[pid 31936] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=31936, si_uid=0} ---
[pid 31937] +++ exited with 0 +++
+++ exited with 0 +++

In this case, detached process would have been pid 31936 (2nd forked process after appimage is exec'd) however it sends itself a SIGTERM after main process exits.

I have walked through appimage runtime, squashfuse and libfuse sources without much progress. I think I am missing some insight. Maybe this is intended behavior or some user error on my end.

@probonopd probonopd transferred this issue from AppImage/AppImageKit Nov 28, 2024
@probonopd
Copy link
Member

probonopd commented Nov 28, 2024

Thanks @mehmetakbulut. Most likely so far never ever thought about running an AppImage in a non-interactive environment. But your use case (running an AppImage using systemd) is a valid one, and maybe there are others.

@TheAssassin I believe this needs some debugger wizardry.

@lalten
Copy link

lalten commented Nov 28, 2024

We have this problem a lot, mostly it becomes a problem when an appimage is restarted over and over with a failing systemd unit. At some point you run into the mount_max limitation of Fuse (default is 1000).

One fix is to enable auto-unmount in squashfuse, like in https://github.com/lalten/appimage-runtime-rs/blob/03911dd353085bfd15e57e5a0cd1211f633ed656/appimage-mount/src/mount.rs#L24-L25
appimage-runtime-rs isn't quite ready for primetime though.

Squashfuse's -o timeout=N means "idle N seconds for automatic unmount"
Fuse's auto_unmount "enables automatic release of the mountpoint if filesystem terminates for any reason"

@lalten
Copy link

lalten commented Nov 28, 2024

The reason this happens is basically #90

(which appimage-runtime-rs also does differently, in https://github.com/lalten/appimage-runtime-rs/blob/03911dd353085bfd15e57e5a0cd1211f633ed656/src/bin/runtime.rs#L117)

probonopd added a commit that referenced this issue Nov 28, 2024
@probonopd probonopd mentioned this issue Nov 28, 2024
@TheAssassin
Copy link
Member

@lalten we're looking into rewriting the runtime in Rust anyway, but neither of us is fluent enough (yet, working on that). In the meantime, do you think you could contribute a similar fix to this codebase?

@mehmetakbulut
Copy link
Author

Thank you guys for the quick response!

I can patch runtime with timeout flags on my end as a workaround. I think that should allow mount & forked process to remain as long as main process exists? (since it is loaded from the mount?)

@lalten Do you have insight into why the behavior is different with bash -i?

# without bash -i
[pid 30815] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=30813, si_uid=0} ---
[pid 30815] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=30813, si_uid=0} ---    # detached process sends sigterm to some other child process
[pid 30815] +++ exited with 0 +++
^Cstrace: Process 30813 detached

vs

# with bash -i
[pid 31937] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=31936, si_uid=0} ---
[pid 31936] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=31936, si_uid=0} ---   # detached process sends sigterm to itself
[pid 31937] +++ exited with 0 +++
+++ exited with 0 +++

I thought maybe it has to do with the foreground flag but I don't quite follow how it would do this in the source.

@probonopd
Copy link
Member

Looks like my PR is solving this now. Thanks everyone!

@probonopd
Copy link
Member

@mehmetakbulut can you confirm that it works now using

Thanks for testing!

@mehmetakbulut
Copy link
Author

Thanks @probonopd, works well on my end!

TheAssassin pushed a commit that referenced this issue Dec 3, 2024
@mehmetakbulut
Copy link
Author

I think once AppImage/AppImageKit#93 is merged, we can close this issue. I suspect it should resolve #99 as well.

Unrelated to this issue but just for my understanding: is there a suggested way to consume type2-runtime in bazel?

We have been using @lalten's fork since it had a more recent tagged/release build but even that is ~2 years behind. It has some additional commits so I am wondering if those are important to keep in a bazel workflow with rules_appimage? The CI builds from AppImage/AppImageKit#93 worked fine to build and run an appimage via bazel. We haven't been using continuous since its hash will keep changing.

@lalten
Copy link

lalten commented Dec 3, 2024

The additional commits in that repo don't really matter, it was me trying to automatically generate a release for every single commit 🙈
What does matter for consuming the runtime in Bazel is that the release artifact has a stable hash, which this upstream repo explicitly does not want to do. That's the only reason why rules_appimage has to use the fork.
I could update the fork, but in the last two years there were no meaningful changes that would change the behavior of rules_appimage-generated artifacts.

@lalten
Copy link

lalten commented Dec 3, 2024

ah, I take it back, there are indeed some useful commits related to fuse3 handling: main...lalten:type2-runtime:main
If that finally works out of the box in upstream type2-runtime I can update the fork

@lalten
Copy link

lalten commented Dec 3, 2024

I thought it was this repo, but regarding the unstable release hashes I'm referring to probonopd/static-tools#41

@mehmetakbulut
Copy link
Author

Thanks lalten! Yeah, there are no manual or automatic tagged/releases in this repo that allow a stable reference from outside. If you are able to update your repo after AppImage/AppImageKit#93 is merged, I think that would be very helpful. 😃 Regardless, I can build/mirror on my end so it is not a major issue.

@probonopd
Copy link
Member

Unrelated to this issue but just for my understanding: is there a suggested way to consume type2-runtime in bazel?

I had to research what Bazel is but ideally it should stay away from using mksquashfs and concatenating it to the type2-runtime itself, and instead let appimagetool do that job. This way, AppImageUpdate and signing works and no nonstandard compressions can inadvertently be used. If that is not feasible for some reason, then at least https://github.com/AppImage/type2-runtime/releases/tag/continuous should be used and a squashfs image using zstandard compression should be appended to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants