-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rare segmentation fault - node v10.13 - CentOS (ip 0000000000efb532, node[400000+1e8c000]) #24955
Comments
That's part of the GC. If the PID isn't that of the process, it means it's happening on a thread that isn't the main thread (because the main thread has PID == TID.) Unfortunately GC crashes are hard to debug because nine times out of ten the real bug is elsewhere; e.g., memory corruption that doesn't manifest until the GC runs. But let's try anyway. Does |
Thanks for the quick response! I'll try running the system with the '--predictable' flag for a while Regarding the native modules:
Thanks, |
Right, that's quite a few native modules, anyone which might be the culprit. ref and ffi are the most likely but it could be anyone of them.1 Try excluding them and see if the crashes go away. 1 I'm reasonably sure it's not heapdump (I'm its author) because it doesn't do anything unless activated but still. |
I'm going to close this out for lack of follow-up. Let me know if you still want to pursue this and I'll reopen. |
Thanks, it's still relevant, but we'll try other nodejs versions (10.14 / 10.15) and reopen in case we'll have the ability to reproduce it more frequently. Currently it's really hard to narrow it down. |
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
--------------- Bug description -------------
After upgrading from node v10.7.0 to node v10.13.0, we started to see crashes in the node process due to segmentation faults.
Crashes happens every one/few days, so it's really hard to reproduce, but it happens.
dmesg logs show some consistency with fixed instruction pointer (ip 0000000000efb532):
[Mon Dec 3 21:10:54 2018] node[25407]: segfault at 3ff0c07598f0 ip 0000000000efb532 sp 00007f1a1a830c40 error 4 in node[400000+1e8c000]
[Tue Dec 4 13:28:47 2018] node[11340]: segfault at 3aea57a0a9c0 ip 0000000000efb532 sp 00007f0ff238dc40 error 4 in node[400000+1e8c000]
[Wed Dec 5 16:13:54 2018] node[13359]: segfault at 329fc81dd2f0 ip 0000000000efb532 sp 00007fc21effcc40 error 4 in node[400000+1e8c000]
[Fri Dec 7 19:36:45 2018] node[29239]: segfault at 13604b37a558 ip 0000000000efb532 sp 00007f57ebffec40 error 4 in node[400000+1e8c000]
[Sat Dec 8 18:53:54 2018] node[30821]: segfault at 204d978e2e50 ip 0000000000efb532 sp 00007f212de0cc40 error 4 in node[400000+1e8c000]
[Sun Dec 9 18:05:08 2018] node[10990]: segfault at 9888ab3a790 ip 0000000000efb532 sp 00007f26261c2c40 error 4 in node[400000+1e8c000]
[Mon Dec 10 20:21:07 2018] node[14981]: segfault at 3cbdf8b02340 ip 0000000000efb532 sp 00007f38a894fc40 error 4 in node[400000+1e8c000]
The failing PID is not the exact PID of the node process, but usually close to it.
Looking at the symbols around 'IP 0000000000efb532':
Points to the symbol - RememberedSetUpdatingItem class, 'process' function:
0000000000efaf50 W v8::internal::RememberedSetUpdatingItemv8::internal::MajorNonAtomicMarkingState::Process()
I wasn't able to capture a coredump yet.
The text was updated successfully, but these errors were encountered: