-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src: diagnostic NodeReport initial implementation #7242
Conversation
Implementation of a diagnostic report for Node, intended for devt, test and production use, to capture and preserve information for problem determination. The code sits behind a new option "--nodereport-events=<list of event types>" which enables selective triggering of the report on unhandled exceptions, fatal errors and signals. The report is not enabled by default. Content of the NodeReport in the initial implementation consists of a header section containing the event type, timestamp, PID and Node version, a section containing the Javascript stack trace, a section containing V8 heap information and an OS platform information section. Existing V8 APIs are used to obtain the stack trace and V8 heap information. There are changes in node.cc to handle the command-line option and report triggering, and new files containing the report writer and testcase. Candidates for additional content in the NodeReport include: native stack traces; detailed JS stacktraces with call parameters, stack locals/references and code; long stack traces; libuv information; OS and hypervisor information and levels; CPU usage; GC history; module-specific information (inserted in the report via callback).
Could you run |
@addaleax darn, of course, will do. thanks. |
/cc @nodejs/post-mortem |
@@ -139,6 +139,7 @@ | |||
'src/node_main.cc', | |||
'src/node_os.cc', | |||
'src/node_revert.cc', | |||
'src/node_report.cc', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add the header file down below? It makes it easier when using Xcode or other IDEs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@evanlucas ah right, will do, thanks
// Check atomic for NodeReport already pending | ||
if (__sync_val_compare_and_swap(&nodereport_signal, 0, signo) == 0) { | ||
fprintf(stderr,"Signal %s received, triggering NodeReport\n", signo_string(signo)); | ||
Isolate::GetCurrent()->RequestInterrupt(SignalDumpInterruptCallback, &nodereport_signal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RequestInterrupt() is not async-signal-safe, it grabs a mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will need to re-think this signal handler. Maybe hand over to a dedicated thread that's waiting on a semaphore, and call the RequestInterrupt() and uv_async_send() APIs from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably piggy-back on the watchdog thread that RegisterDebugSignalHandler() in src/node.cc starts.
@rnchamberlain In nodejs/post-mortem#24, @davepacheco had asked a few questions that seems to still be relevant. They are quoted below, along with your answers:
However, it seems
This PR still uses SIGUSR2, so the question still stands. |
They'll just override the NodeReport signal handler. I don't see a problem with that. Making the signal number configurable is nevertheless a good idea. |
@rnchamberlain It seems that these reports could be generated with the same info by a native add-on, is that correct? If so, what's the motivation for adding this support to Node's core instead of implementing it as a npm module? |
@misterdjules I'll let Richard give the full answer but I believe at least the part that triggers the generation needs to be in the core and given that providing minimum base information that will be useful in most cases makes sense. You might then want it to be possible to extend whats in the report through modules. |
How does this change the behavior for existing Node programs that use SIGUSR2? As Ben says, existing Node programs that use SIGUSR2 would override the --nodereport handler, so no behaviour change. The PR does address this issue by allowing a choice of the signal to be used to trigger the NodeReport - SIGUSR2 or SIGQUIT. Other signals could be added (note Node currently adds handlers for SIGUSR1 for debug, and SIGINT/TERM for terminal settings).I would expect Node programs that use signals to have some documentation and/or a runtime indication, so a user wanting the report could select a different signal. ...these reports could be generated with the same info by a native add-on As Michael says, the triggering needs to be in core, and I anticipate that some content would only be available if implemented in core. We have discussed (locally) providing a call-out so that modules could add information to the report. I understand the 'minimal kernel of core functionality' goal, but I think there is enthusiasm for more diagnostic support in Node core, eg #7059 strstr is still used to parse the value(s) of the nodereport_events command line Yes - does need a more robust parser, e.g to provide messages to the user if there are typos/mistakes in the supplied option, and to allow for future enhancements (event names might overlap). Just this was not a priority for the PR compared with getting the report triggering and basic content available. Will be fixed. |
@rnchamberlain when you say "Will be fixed." do you mean you will fix as part of this PR or in a follow on ? Just wondering if this PR is ready for final review or there are more changes to come ? |
Is the assumption that nobody is using SIGUSR2 with the default behavior of terminating the program normally? I could imagine using this as part of a multi-process program, but I don't know of any that do, and I wouldn't figure it would be a great idea anyway. Still, is it worth adding this behavioral change to the release notes? |
What specifically needs to be in core? Except for the fatal error events I fail to see why core would need to be changed to support this, but I may very well be missing something. |
Fixes for PR review comments: 1. Line length, whitespace, snprintf etc fixes for lint 2. Add .h file to node.gyp 3. Re-write signal handler to hand over to separate thread. The fix shares the debug watchdog thread code. Actually sharing a single watchdog thread would be problematic because on semaphore wake-up the watchdog would need to know which signal was sent, and to allow for a debug signal arriving while a nodereport was in progress. 4. Add a parser for the --nodereport-events option, remove the use of strstr(). 5. Naming and checking improvements suggested by Ben
See fixes for lint, signal handler, option parsing issues etc in Jun 16 commit. The whitespace changes for lint make it quite a large delta, apologies. @davepacheco @misterdjules |
@rnchamberlain It seems even the fatal error events could be implemented by setting a custom I still fail to see how putting this in core now would help both maintainers and users of these tools. For instance, in theory any change to the format of the output would be a semver-major change, and would be available at the earliest in the next major version of Node.js at the time the changes are merged. Implementing this as a module would allow maintainers to release new versions quickly and independently from Node.js' release schedule. Users would just need to update their dependency on that module to use newer releases. It seems like an easier workflow both for maintainers of node's core and users/maintainers of these features. My bias towards not integrating this in core, at least for now, is also reinforced by the fact that it seems the design and implementation of these features are at an early stage and that we should expect a lot of changes both in the implementation and in what users consume. |
Right ho, going the npm route, thanks all. |
Checklist
make -j4 test
(UNIX) orvcbuild test nosign
(Windows) passesAffected core subsystem(s)
src
Description of change
Implementation of a diagnostic report for Node, intended for devt,
test and production use, to capture and preserve information for
problem determination. The code sits behind a new option
"--nodereport-events=" which enables selective
triggering of the report on unhandled exceptions, fatal errors and
signals. The report is not enabled by default.
Content of the NodeReport in the initial implementation consists of
a header section containing the event type, timestamp, PID and Node
version, a section containing the Javascript stack trace, a section
containing V8 heap information and an OS platform information
section. Existing V8 APIs are used to obtain the stack trace and V8
heap information. There are changes in node.cc to handle the
command-line option and report triggering, and new files containing
the report writer and testcase.
Candidates for additional content in the NodeReport include: native
stack traces; detailed JS stacktraces with call parameters, stack
locals/references and code; long stack traces; libuv information; OS
and hypervisor information and levels; CPU usage; GC history;
module-specific information (inserted in the report via callback).
More information here:
nodejs/post-mortem#24
https://github.com/rnchamberlain/node/wiki/NodeReport
@nodejs/post-mortem
@bnoordhuis @mhdawson