Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix ksh login crash on disk full (rhbz#1212992)
Original patch: https://src.fedoraproject.org/rpms/ksh/blob/642af4d6/f/ksh-20140801-diskfull.patch Prior discussion: https://www.mail-archive.com/[email protected]/msg01037.html https://www.mail-archive.com/[email protected]/msg01038.html https://www.mail-archive.com/[email protected]/msg01042.html https://bugzilla.redhat.com/1212992 On Fri, 08 May 2015 14:37:45 -0700, Paulo Andrade wrote: > I have a user with a ksh crashing problem, and that has > some "Write error: No space left on device" messages > in /var/log/messages. > > After some debugging, and creating a chroot on a file > disk image, and a test user, and slowly filling the > "on file" filesystem, e.g. > > dd if=/dev/zero of=/mnt/tmp/zerosN bs=1M count=1024 > dd if=/dev/zero of=/mnt/tmp/zerosN bs=1K count=2 > > until leaving just around 12K, I managed to reproduce the > problem, and be able to debug it with valgrind and vgdb; > debugging on these conditions is tricky, as cannot tell > valgrind to spawn gdb, because then gdb itself would fail > to start. > > So, after following the code enough, I learned that at places > it handles SH_JMPEXIT, there was almost non existing > handling of SH_JMPERREXIT. > > ksh would evently cause a crash due to the struct > subshell allocated on stack, in sh/subshell.c:sh_subshell > kept set to the global subshell_data, after it siglongjmp > back the stack due to, not fully handling the out of disk > space errors. It would print a few messages, everytime > a pipe was created, e.g.: > > /etc/profile: line 28: write to 3 failed [No space left on device] > > until eventually crashing due to corrupted memory; e.g. the > references to stack data from sh_subsell in the global > subshell_data. One strange thing to me in coredump analysis > was that subshell_data prev field was pointing to itself when > it eventually crashed, what later was understood and expected... > > The attached patch handles SH_JMPERREXIT in the code > paths SH_JMPEXIT is handled, and the failed login, on > full disk, ends in a pause() call: > > ---terminal 1--- > $ valgrind -q --leak-check=full --free-fill=0x5a --vgdb=full > --vgdb-error=0 /bin/ksh -l > ==17730== (action at startup) vgdb me ... > ==17730== > ==17730== TO DEBUG THIS PROCESS USING GDB: start GDB like this > ==17730== /path/to/gdb /bin/ksh > ==17730== and then give GDB the following command > ==17730== target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=17730 > ==17730== --pid is optional if only one valgrind process is running > ==17730== > ==17730== Syscall param mount(type) points to unaddressable byte(s) > ==17730== at 0x563377A: mount (in /usr/lib64/libc-2.17.so) > ==17730== by 0x493E58: fs3d_mount (fs3d.c:115) > ==17730== by 0x493C8B: fs3d (fs3d.c:57) > ==17730== by 0x423E41: sh_init (init.c:1302) > ==17730== by 0x405CD3: sh_main (main.c:141) > ==17730== by 0x405B84: main (pmain.c:45) > ==17730== Address 0x0 is not stack'd, malloc'd or (recently) free'd > ==17730== > ==17730== (action on error) vgdb me ... > ==17730== Continuing ... > /etc/profile: line 28: write to 3 failed [No space left on device] > ---8<--- > > ---terminal 2--- > (gdb) c > Continuing. > ^C > Program received signal SIGTRAP, Trace/breakpoint trap. > 0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6 > (gdb) bt > #0 0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6 > #1 0x000000000041e73d in sh_done (ptr=0x793360 <sh>, sig=255) at > /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/fault.c:665 > #2 0x0000000000407407 in exfile (shp=0x4542, iop=0xff, fno=0) at > /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:604 > #3 0x0000000000405c43 in sh_source (shp=0x793360 <sh>, iop=0x0, > file=0x524804 <e_sysprofile> "/etc/profile") > at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:109 > #4 0x00000000004060e4 in sh_main (ac=2, av=0xfff000498, userinit=0x0) > at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:202 > #5 0x0000000000405b85 in main (argc=2, argv=0xfff000498) at > /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/pmain.c:45 > (gdb) > ---8<---
- Loading branch information