-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI failures with no output/stack trace #19903
Comments
Last round of investigation happened in #18998 |
Two things I found out, which don’t give a lot of insight but might narrow things down a bit:
|
Does |
@bnoordhuis Unfortunately, no. :/ There are only a handful lines reported after boot on the debian-x86 machine, a few from invalid incoming external UDP packages, and 7 segfault notices for |
@bnoordhuis Okay, this turns out to be a pretty deep rabbit hole. The reported exit code according to #19855 is Lines 582 to 583 in 5e68172
Also, I found out there’s a magical place where all the core dumps go on debian machines with This is the stack trace I’m getting from the most recent core dump that I could produce:
(main thread stack)
(Also, in The It seems this was introduced by libuv/libuv@647fbc0 – we picked this up in #18260 on Jan 24 on I don’t see anything wrong with that commit. However, more debugging, and digging through the glibc source code and its git history brings up https://sourceware.org/bugzilla/show_bug.cgi?id=12674, which seems like it’s exactly the bug that we’re experiencing. (Here’s a pure-C reproduction that is similar to the libuv code and is fixed by bminor/glibc@042e152.) The fix is included in glibc >= 2.21, but I’m not sure what that means for us going forward. |
nice work! For the record, the cores collected by systemd are only there because I had |
Hack around https://sourceware.org/bugzilla/show_bug.cgi?id=12674 by providing a custom implementation for glibc < 2.21 in terms of other concurrency primitives. The glibc implementation on these versions is inherently unsafe. So, while libuv and Node.js support those versions, it seems to make sense for libuv in its functionality as a platform abstraction library to provide a working version. Fixes: nodejs/node#19903
We could work around that by writing a custom script that saves the binary to somewhere and use it in |
Otherwise, potentially no output is shown for aborts. Refs: nodejs#19903
@addaleax - I was playing around with your C-reproduce, but it always succeeded - is there any precondition for it to fail? my system's glibc is 2.11, and have ample # of CPUs. |
@gireeshpunathil The glibc implementation is hardware-dependent, so I guess that might be an issue? Also, weirdly enough, this problem only seemed to reproduce when the machine was not under load… |
@addaleax - thanks - that is a great clue. Less load means 2 threads stand great chance to run in 2 real CPUs in true parallel manner, increasing the chance of cluttered access. On the other hand if 2 sem_* threads share CPU, the scheduler will be more |
Otherwise, potentially no output is shown for aborts. PR-URL: nodejs#19990 Refs: nodejs#19903 Reviewed-By: Gireesh Punathil <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Refael Ackermann <[email protected]>
Otherwise, potentially no output is shown for aborts. PR-URL: #19990 Refs: #19903 Reviewed-By: Gireesh Punathil <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Refael Ackermann <[email protected]>
Re-opening until the libuv update lands on |
@bnoordhuis @santigimeno @addaleax do you think this is a big enough issue that it's worth getting a libuv release out today? |
@cjihrig I think we can guess the impact of this from looking at the frequency on the CI machine: On Node.js v8.x and v9.x, when used on an OS that ships those old glibc versions, every 3000th Node.js process or so crashes once it tries to do something interesting and the machine isn’t under load. It’s not the end of the world, but I think it would be good if we could get a fix out in Node.js this month or so, in particular because this has also ended up in an LTS branch. |
PR-URL: #20129 Fixes: #20112 Fixes: #19903 Reviewed-By: Myles Borins <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Richard Lau <[email protected]>
Otherwise, potentially no output is shown for aborts. PR-URL: nodejs#19990 Refs: nodejs#19903 Reviewed-By: Gireesh Punathil <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Refael Ackermann <[email protected]>
Otherwise, potentially no output is shown for aborts. PR-URL: #19990 Refs: #19903 Reviewed-By: Gireesh Punathil <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Refael Ackermann <[email protected]>
PR-URL: #20129 Fixes: #20112 Fixes: #19903 Reviewed-By: Myles Borins <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Richard Lau <[email protected]>
Otherwise, potentially no output is shown for aborts. PR-URL: #19990 Refs: #19903 Reviewed-By: Gireesh Punathil <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Refael Ackermann <[email protected]>
PR-URL: nodejs#20129 Fixes: nodejs#20112 Fixes: nodejs#19903 Reviewed-By: Myles Borins <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Richard Lau <[email protected]>
Backport-PR-URL: #24103 PR-URL: #20129 Fixes: #20112 Fixes: #19903 Reviewed-By: Myles Borins <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Richard Lau <[email protected]>
We've been seeing these for quite some time. They look like this:
No contents for
stdout
orstderr
but the test failed. Huh? Wha?Might be a problem in
tools/test.py
.@BridgeAR and @addaleax have looked to varying degrees at this issue but haven't discovered anything as far as I know.
@refack and I have a PR open to add exit code to the output when these things happen in case that helps. #19855
@nodejs/testing @nodejs/build
The text was updated successfully, but these errors were encountered: