-
Notifications
You must be signed in to change notification settings - Fork 20
Triggering post-mortem diagnostics in Node core #44
Comments
cc @nodejs/diagnostics |
I think node core should include at least a minimal set of diagnostics out of the box. At the very least you want to be able to generate capture the dumps/information that you can then use with external tools. I'd start be making sure that out of the box you can easily generate the following on request, on uncaught exception or crash:
|
Is a heapdump deriveable from a core dump? In theory, a heapdump is a walk of the in-memory v8 data structures, and those members got serialized to disk with the core dump. |
You can to a first approximation by interpreting the However, references to heap objects also exist on the stack, in registers, in handle scopes and in the global handles list (what stores |
That's why mdb_v8 scans the entire process address space and uses various heuristics to identify active objects, rather than starting a walk from a particular list of roots. The heuristics could, I suspect, be improved by including reachability from the roots, but it's not required right now to find all objects. |
You scan for anything that looks like a pointer into one of the JS heap ranges? I've tinkered with that approach but it wasn't reliable enough to my liking. What else do you do? |
Among the heuristics that @jclulow alluded to: when mdb_v8 finds a candidate object, it attempts to enumerate its properties. (It needs to do this anyway to classify objects to allow users to search for objects by property name or constructor.) In order to do that, several data structures in different C++ objects need to be self-consistent, and their types need to match well-known values. (There's a detailed comment including diagrams of some of the data structures in the mdb_v8 source.) If any of these don't match up, mdb_v8 assumes this isn't a valid object and doesn't report it. For each property of the object, mdb_v8 also fetches the value (which itself sometimes requires traversing another data structure whose type has to match what's expected) and checks that its type corresponds to a valid type that it knows about. We have not found many objects that are consistent in all these ways but that are actually false positives, nor have we found too many that are pruned by these heuristics that are actually valid. (There are some exceptions, and there's more we could do here. For example, I'd like to build a dcmd that iterates the objects we pruned and makes sure that they're not reachable from the graph of objects that we found. That would help us smoke out any cases where we're pruning something incorrectly or not pruning something we should be.) |
llnode uses a similar brute force algorithm, see FindJSObjectsVisitor::Visit() in https://github.com/nodejs/llnode/blob/master/src/llscan.cc @hhellyer had a close look at producing a JSON heap dump from a core dump, using llnode. It would provide a useful route to the DevTools GUI with its retained size analysis. However, we don't have enough metadata to find the heap roots, handles etc that Ben describes above, and that we'd need for the heap dump format. I think you can get enough from line-mode mdb_v8 or lldb/llnode to solve memory leaks, just it's a bit harder work. |
I have been experimenting with some of these, and other related ideas, for improving the developer experience when dealing with remotely deployed applications (e.g., running in the cloud). In particular I was experimenting a system that provided:
The prototype work is in my fork here (enabled using a --record flag) with the major changes being:
After playing around with how this works I am really excited about the great developer experience that can be created with this kind of approach (particularly if it includes heap dumps or execution trace information). |
Adding better hooks in node core would be a good starting point, and could perhaps get around the 'keep core small' rule. Improved hooks are needed in c/c++ code as used by node-report, as well as in .js code (as per @mrkmarron). |
I still be believe that we can balance small core with what makes sense to be in the core. I believe diagnostics, particularly where native code is required, meets the bar as to what should be in core. |
I would think determining what meets the bar for what should be in core requires us to take a closer look at the "what" in question, and then answering why it needs to be added into node's core. I also think that "requiring native code" is not a sufficient reason to mandate inclusion in core. From @rnchamberlain's original post, I have the following questions:
What does happen if the node process runs out of memory (not only out of the V8 JS heap's space)? Would a mechanism that generates a heapdump still be able to find the resources needed to generate it? I would think that in some cases at least that could be problematic, and that an approach based on generating OS-level core dumps would not have these problems.
Can you expand on that? I don't understand what this describes. Can't node programs already catch signals and exceptions raised/thrown from native code?
Can you expand on that? Could you describe representative use cases and what tools/mechanisms you have in mind to deal with them? The following argument for adding the above mentioned diagnostic facilities into node's core:
is not clear to me, since we could use this argument for pretty much anything that is useful for any group of node users. I would think the question to ask ourselves is closer to: "Do users of node need to have these diagnostic facilities available out of the box and why?". The second argument for adding the above mentioned diagnostic facilities into node's core:
is also not clear to me. How better would the above mentioned diagnostic tools be if they were included in core instead of available as a npm module? In general, I would think that for the project to properly evaluate the pros and cons of what adding the above mentioned diagnostic tools means and why we'd want to do that, writing a detailed enhancement proposal in nodejs/node-eps would be a good way forward. |
The idea of a proposal in nodejs/node-eps might be the way to go. Not to answer all of your questions, but there was recent example we had an individual reach out to @MylesBorins, @ofrobots and myself on twitter asking how to debug a memory leak. When we suggested generating heapdumps with the heapdump module they initially had trouble getting the module compiled and working in their environment. It's a simple case where there would have been value to them by simply having a mechanism to generate heapdumps pre-built and available as part of the runtime. |
I believe this is https://twitter.com/ofrobots/status/847260522982526976. Adding it so that people in this discussion can get a better idea of what that discussion was.
It's not clear to me how having a mechanism to generate heapdumps built into node's core would have improved the user experience in that specific case. But most importantly it's not clear to me that not having that mechanism in core would not allow to address these issues at least as well. In any case, I would expect that an "enhancement proposal" in nodejs/node-eps would document this use case and others in greater details than a Twitter thread. |
It's only a matter of time before the inspector grows programmatic hooks and then user-programmable heapdumps are a fait accompli and this whole discussion will be moot. :-) |
That is what happened more-or-less in Java. After a few iterations of the debugger support, an architected API appeared https://docs.oracle.com/javase/7/docs/platform/jvmti/jvmti.html, which a variety of tools could exploit. |
The availability of that API doesn't imply that it should be used e.g to generate heap dumps on out of memory errors.
The JVMTI document describes a broad set of APIs, some of which at least are not related to post-mortem debugging. This repository (and thus I assumed this discussion) is about post-mortem debugging. Thus it's not clear to me how the two relate to each other. Can we scope this issue so that it's clear what needs to be discussed in terms of post-mortem debugging? |
So I think the scope of this issue, and what we are discussing as a first pass is now:
Possible events are for example: JS heap OOM or threshold, uncaught exception, slow response time, tracepoint etc. If there is enthusiasm we could progress to writing the detail on https://github.com/nodejs/node-eps |
Closing due to inactivity. If this work still needs to be tracked, please open a new issue over in https://github.com/nodejs/diagnostics. |
This came up in the Diagnostics WG meeting 23 Feb 2017: nodejs/diagnostics#85
There are some diagnostic features in Node.js:
The kind of things that other runtimes provide are:
Some arguments for adding diagnostics into Node core are:
Some arguments against are:
Note: post-mortem tooling is separate issue, the arguments for bundling it in core are likely to be weaker. However, there is discussion on upstreaming core dump tooling to V8: nodejs/llnode#64
The text was updated successfully, but these errors were encountered: