Skip to content
This repository has been archived by the owner on Aug 7, 2023. It is now read-only.

Linux support for mdb #17

Closed
Raynos opened this issue Feb 6, 2016 · 38 comments
Closed

Linux support for mdb #17

Raynos opened this issue Feb 6, 2016 · 38 comments

Comments

@Raynos
Copy link

Raynos commented Feb 6, 2016

There was a conversation about post mortem on nodejs/node where an engineer did not know how mdb worked because he's never run a smartos vm. This started me thinking about Linux support.

one of the uber engineers ported a subset of mdb to Linux.

He observed that mdb_v8 is just a c program with function calls into mdb and he just had to implement the mdb functions for Linux outside of mdb.

This creates a standalone Linux binary that can debug cores.

The code is in a really hacky place and I don't think uber or the engineer has time to polish it up to a good state.

However we would love to share this proof of concept (currently private).

@davepacheco
Copy link

I think that prototype implementation would be great to have available publicly!

We've discussed this a lot, and I think by far the better longer-term approach for getting mdb_v8 on Linux is to turn mdb_v8 into a C library that could be easily loaded into different programs, including mdb. I've started doing that work, but it's basically a background project (see TritonDataCenter/mdb_v8#35 for an example). Porting MDB itself to Linux would be great too, but I don't think anybody has volunteered to do it.

I believe the file format we've been discussing under #13 addresses some people's more immediate needs for this because they intend to run the real MDB to extract this information and then process it with other tools (that can be more cross-platform).

All that aside, a working implementation would be very useful!

@lucamaraschi
Copy link
Contributor

@Raynos if you guys could share your PoC, that would be awesome. Any ETA?
I agree with @davepacheco that the immediate needs can be addressed by a generic file format, but definitely that PoC could help a lot ;-)

@Raynos
Copy link
Author

Raynos commented Feb 8, 2016

ETA is 2 weeks out. Currently On holiday. I'll reach to the engineer next week. Might need some cleanup, I'll try and get it out.

@Fishrock123
Copy link

@indutny was working on an lldb thing with similar capabilities not that long ago.

@indutny
Copy link
Member

indutny commented Feb 9, 2016

Yeah, this work is here: https://github.com/indutny/llnode . Appears to be working just fine on linux

@mpareja
Copy link

mpareja commented Feb 9, 2016

👍 for not needing to dust off OmniOs instances to look at dumps.

@rnchamberlain
Copy link
Contributor

mdb_v8 as a C library would be great, hopefully we could make use of this when reading core dumps with other OS debuggers. windbg has support for extensions via a C API, the support in gdb is a python API, but maybe we'd just need a thin layer to get to/from mdb_v8 library. For the uber stand-alone tool I guess they wrote their own Linux/ELF core reader, which is more work than using a debugger (but still quite doable).

@hhellyer
Copy link
Contributor

This issue on nodejs/node nodejs/node#5696 talks about making debug symbol files available generally on all platforms. That would be a big help inside any debugger (gdb, lldb or windbg) and potentially make if far easier to navigate the dumps and ease a lot of the problems around writing debug extensions.
From a post-mortem group point of view is there any way we can encourage this? :-)

@lucamaraschi
Copy link
Contributor

@Raynos Do you have any update on the tools that you guys are developing at Uber?

@Raynos
Copy link
Author

Raynos commented Mar 22, 2016

@lucamaraschi I've got a copy of the linux mdb; I'm going to try and build it locally and write a README with reproducable steps.

@rnchamberlain
Copy link
Contributor

We are looking at implementing the MDB V8 DMOD commands as GDB extensions (in Python). Do folks think this is worth doing, given the general availability of LLDB on Linux?

@yunong
Copy link
Member

yunong commented Apr 4, 2016

@rnchamberlain Yes, this would be great! There's a huge amount of appetite for the mdb_v8 commands to be made available as LLDB extensions. @tjfontaine already started on a port here: https://github.com/tjfontaine/lldb-v8

@indutny
Copy link
Member

indutny commented Apr 4, 2016

@yunong the main difference between LLDB and mdb plugin is that you can't iterate allocated memory in LLDB. LLDB just does not keep mapped regions after parsing core file. Wish they did! (Or perhaps I should contribute it myself...)

@rnchamberlain
Copy link
Contributor

Ah yes, we noticed that to list JS objects, the MDB DMOD walks all the memory sections. We think 'maintenance info sections' returns the malloc'ed memory in GDB. It would be very useful to be able to do that in LLDB. An alternative could be locate the JS heap bounds, perhaps via some PD data stored at Node start-up.

My question was about the relative merits of investing in a GDB plugin vs the LLDB plugin. So there is an issue of what the plugin APIs support. There is a GDB vs LLDB command comparison here: http://lldb.llvm.org/lldb-gdb.html.

@indutny
Copy link
Member

indutny commented Apr 5, 2016

@rnchamberlain well, technically we already have at least two LLDB plugins: https://github.com/indutny/llnode and https://github.com/tjfontaine/lldb-v8 . This maintenance info sections, though, could be useful for implementing analog of findjsobjects.

@rnchamberlain
Copy link
Contributor

@indutny thanks, corrected, and yours being the more recent and active of the two. Also you used the C++ API rather than the Python one, which could be better for sharing code with the mdb one?

@indutny
Copy link
Member

indutny commented Apr 5, 2016

@rnchamberlain perhaps... :)

@lucamaraschi
Copy link
Contributor

@indutny @rnchamberlain I am currently using llnode as a quick daily tool, even if I fallback to use mdb in my VM when I need deeper analysis on the dumps.
Is there any way that we can coordinate the effort of creating one tool instead of spreading across multiple attempts?

I would suggest that we create a repo and collaborate all on it, starting from creating a list of issues which represents the feature that we want to expose/rollout so that we can maybe democratically vote for priorities.

@yunong shall we summon soon a quick catch-up with this topic on the agenda?

@indutny
Copy link
Member

indutny commented Apr 5, 2016

@lucamaraschi I totally agree, guess if we want - I won't mind to move llnode to nodejs org.

@lucamaraschi
Copy link
Contributor

@indutny I guess that would be a great start (and a new beginning!).
Let's discuss it and decide during the next WG meeting.

@indutny
Copy link
Member

indutny commented Apr 5, 2016

Let's discuss it and decide during the next WG meeting.

Is it already scheduled?

@yunong
Copy link
Member

yunong commented Apr 5, 2016

@indutny #23

@rnchamberlain
Copy link
Contributor

@indutny re iterating allocated memory in LLDB you are right, there is no equivalent in the LLDB API for what we get with 'maintenance info sections' in GDB. We are looking at a work-around using readelf to get the memory ranges, but it really needs fixing in LLDB. Internally LLDB knows from the core file what memory is allocated, it just needs to expose that via one of the plugin APIs.

@indutny
Copy link
Member

indutny commented Apr 12, 2016

I believe LLDB discards this info, but agree that it should fixed

On Tuesday, April 12, 2016, Richard Chamberlain [email protected]
wrote:

@indutny https://github.com/indutny re iterating allocated memory in
LLDB you are right, there is no equivalent in the LLDB API for what we get
with 'maintenance info sections' in GDB. We are looking at a work-around
using readelf to get the memory ranges, but it really needs fixing in LLDB.
Internally LLDB knows from the core file what memory is allocated, it just
needs to expose that via one of the plugin APIs.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#17 (comment)

@rnchamberlain
Copy link
Contributor

@indutny ProcessElfCore::DoReadMemory() seems to be using a saved array of memory chunks

@indutny
Copy link
Member

indutny commented Apr 13, 2016

@rnchamberlain you are right, I thought it discards it everywhere, because it does this for Macho Cores.

@indutny
Copy link
Member

indutny commented Apr 13, 2016

Gosh, nevermind. I don't know what I was looking at, but it indeed does read it in Mach-O cores too.

@hhellyer
Copy link
Contributor

I’ve prototyped a dumpobjects command based on llnode which scans the whole memory for anything that looks like a v8 heap object. (Much like the findjsobjects command in mdb_v8.)
At the moment due to an lldb limitation it has to read a list of valid memory ranges from a file generated from the core files program headers using readelf --segments to know which areas of the address space are valid to scan - we would like to improve that.

With this is it possible to get a simple set of totals of object by map and then using a second command (dumpinstances) list all the objects that use that map:

(lldb) plugin load llnode.so
(lldb) dumpobjects
Type 0x21DC8CD1B911 - instance count: 1, total size 56 - example name: 0x00000e7792542039:<Object: Object>
Type 0x21DC8CD1BE91 - instance count: 1, total size 56 - example name: 0x00000e77925464c9:<Object: Object>
Type 0x21DC8CD1BDE1 - instance count: 1, total size 56 - example name: 0x00000e7792546449:<Object: Object>
.....
Type 0x326A05204209 - instance count: 1764, total size 0 - example name: 0x000029ce0ce04131:<FixedArray, len=0>
Type 0x326A05204159 - instance count: 1803, total size 158664 - example name: 0x0000326a05204101:<Map own_descriptors=0 descriptors=0x000029ce0ce04121>
Type 0x326A052042B9 - instance count: 3077, total size 73848 - example name: 0x00000e779254e351:<unknown>
Type 0x326A05204261 - instance count: 4398, total size 0 - example name: 0x000029ce0ce04291:<String: "">
Memory scan found 22777 objects.

(lldb) dumpinstances 0x326A05204261
.....
0x000032f564c37771:<String: "splitPathRe">
0x000032f564c37799:<String: "^(\/?|)([\s\S]*?...">
0x000032f564c377f1:<String: "posix">
0x000032f564c37811:<String: "posixSplitPath">
0x000032f564c37839:<String: "out">
0x000032f564c37859:<String: "/">

I need to work on the sizes and group but initially I wanted to confirm I could scan the memory and locate a sensible number of objects. It gives a similar object total to a heap dump generated at the same time.
I think I can generate the same memory ranges information on OSX using the output of otool -l but haven’t tested it yet.
(I'll push this to github on fork on llnode when I've had a chance to tidy up the code a little.)

@indutny
Copy link
Member

indutny commented Apr 18, 2016

Woow! This looks awesome!

I would love to take a look at the code! 😉

Thanks for sharing.

On Monday, April 18, 2016, Howard Hellyer [email protected] wrote:

I’ve prototyped a dumpobjects command based on llnode which scans the
whole memory for anything that looks like a v8 heap object. (Much like the
findjsobjects command in mdb_v8.)
At the moment due to an lldb limitation it has to read a list of valid
memory ranges from a file generated from the core files program headers
using readelf --segments to know which areas of the address space are
valid to scan - we would like to improve that.

With this is it possible to get a simple set of totals of object by map
and then using a second command (dumpinstances) list all the objects that
use that map:

(lldb) plugin load llnode.so
(lldb) dumpobjects
Type 0x21DC8CD1B911 - instance count: 1, total size 56 - example name: 0x00000e7792542039:<Object: Object>
Type 0x21DC8CD1BE91 - instance count: 1, total size 56 - example name: 0x00000e77925464c9:<Object: Object>
Type 0x21DC8CD1BDE1 - instance count: 1, total size 56 - example name: 0x00000e7792546449:<Object: Object>
.....
Type 0x326A05204209 - instance count: 1764, total size 0 - example name: 0x000029ce0ce04131:<FixedArray, len=0>
Type 0x326A05204159 - instance count: 1803, total size 158664 - example name: 0x0000326a05204101:
Type 0x326A052042B9 - instance count: 3077, total size 73848 - example name: 0x00000e779254e351:
Type 0x326A05204261 - instance count: 4398, total size 0 - example name: 0x000029ce0ce04291:<String: "">
Memory scan found 22777 objects.

(lldb) dumpinstances 0x326A05204261
.....
0x000032f564c37771:<String: "splitPathRe">
0x000032f564c37799:<String: "^(/?|)([\s\S]*?...">
0x000032f564c377f1:<String: "posix">
0x000032f564c37811:<String: "posixSplitPath">
0x000032f564c37839:<String: "out">
0x000032f564c37859:<String: "/">

I need to work on the sizes and group but initially I wanted to confirm I
could scan the memory and locate a sensible number of objects. It gives a
similar object total to a heap dump generated at the same time.
I think I can generate the same memory ranges information on OSX using the
output of otool -l but haven’t tested it yet.
(I'll push this to github on fork on llnode when I've had a chance to tidy
up the code a little.)


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#17 (comment)

@hhellyer
Copy link
Contributor

Pushed prototype here: https://github.com/hhellyer/llnode/tree/dumpobjects_prototype - please don't review the code too closely :-)
Scripts for generating memory ranges are here: https://gist.github.com/hhellyer/6d79009e142d6eef696525afe1ecea43
(I may push a fix later if I figure out why llnode is failing to run anything on my OSX box.)

@indutny
Copy link
Member

indutny commented Apr 19, 2016

Looks great!

Let me know if I can help you debug llnode on your OSX. I'm an OSX user
myself, and it appears to be working on my side.

Could it be just a lldb headers version mismatch?

On Tue, Apr 19, 2016 at 11:09 AM, Howard Hellyer [email protected]
wrote:

Pushed prototype here:
https://github.com/hhellyer/llnode/tree/dumpobjects_prototype - please
don't review the code too closely :-)
Scripts for generating memory ranges are here:
https://gist.github.com/hhellyer/6d79009e142d6eef696525afe1ecea43
(I may push a fix later if I figure out why llnode is failing to run
anything on my OSX box.)


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#17 (comment)

@hhellyer
Copy link
Contributor

Thanks, if I can't figure it out I'll ask in an issue on https://github.com/indutny/llnode to avoid cluttering this issue up. (I think it might be a mismatch of some kind, my lldb reports version 350 so it's probably either that or a C++ linking issue.)

@indutny
Copy link
Member

indutny commented Apr 21, 2016

@hhellyer after investigation, it looks like it is compatible with 380, not 350... despite the version :(

@hhellyer
Copy link
Contributor

Yep, having just gone to the lengths of building the 380 release from source I can see that the newly built lldb gives a 350 version number:
$ bin/lldb -v
lldb-350.99.0
which probably explains my confusion.
The plugin now seems to work with the standard lldb installed on my mac but I need to fix my commands.

@lucamaraschi
Copy link
Contributor

@Raynos any update on the Uber's implementation?

@rnchamberlain
Copy link
Contributor

LLDB 3.9 is now available, which has the @hhellyer enhancement to find all the memory ranges in a core dump without the need for his readelf script workaround. The 3.9 packages are here: http://apt.llvm.org/

The installation of LLDB 3.9 on ubuntu was a bit tricky, I found this earlier info useful:
http://askubuntu.com/questions/735201/installing-clang-3-8-on-ubuntu-14-04-3

and I ended up with this (for ubuntu 14)

wget -O - http://apt.llvm.org/llvm-snapshot.gpg.key|sudo apt-key add -
apt-add-repository "deb http://apt.llvm.org/trusty/ llvm-toolchain-trusty-3.9 main"
apt-add-repository "deb-src http://apt.llvm.org/trusty/ llvm-toolchain-trusty-3.9 main"
apt-get update
apt-get install clang-3.9 lldb-3.9
apt-get install liblldb-3.9-dev

Then rebuild the @indutny project https://github.com/indutny/llnode against LLDB 3.9 and off we go:

root@ThinkCentre-M57p:/home/rchamberlain# lldb -c core
(lldb) target create --core "core"
Core file '/home/rchamberlain/core' (x86_64) was loaded.
(lldb) plugin load llnode/out/Release/obj.target/llnode.so
(lldb) v8 help
 Node.js helpers

The following subcommands are supported:

  bt              -- Show a backtrace with node.js JavaScript functions and their args. An optional argument is accepted; if that argument is a
                     number, it specifies the number of frames to display. Otherwise all frames will be dumped.

                     Syntax: v8 bt [number]
  findjsinstances -- List all objects which share the specified map.
                     Accepts the same options as `v8 inspect`
  findjsobjects   -- List all object types and instance counts grouped by map and sorted by instance count.
                     Requires `LLNODE_RANGESFILE` environment variable to be set to a file containing memory ranges for the core file being
                     debugged.
                     There are scripts for generating this file on Linux and Mac in the scripts directory of the llnode repository.
  inspect         -- Print detailed description and contents of the JavaScript value.

                     Possible flags (all optional):

                      * -F, --full-string    - print whole string without adding ellipsis
                      * -m, --print-map      - print object's map address
                      * --string-length num  - print maximum of `num` characters in string

                     Syntax: v8 inspect [flags] expr
  nodeinfo        -- Print information about Node.js
  print           -- Print short description of the JavaScript value.

                     Syntax: v8 print expr
  source          -- Source code information

For more help on any particular subcommand, type 'help <command> <subcommand>'.

(lldb) v8 findjsobjects
 Instances  Total Size Name
 ---------- ---------- ----
      1         24 FastBuffer
      1         24 JSON
      1         24 Math
      1         24 RangeError
      1         32 Arguments
      1         32 ContextifyScript
      1         32 WriteWrap
      1         48 TickObject
      1         96 Console
      1        104 Agent
      1        112 Server
      1        120 ServerResponse
      1        120 exports.FreeList
      1        144 Timeout
      1        224 Socket
      1        240 IncomingMessage
      2         48 process
      2         64 HTTPParser
      2         64 Signal
      2        208 EventEmitter
      2        208 WriteStream
      2        256 TimersList
      2        272 Module
      3         96 TCP
      3         96 TTY
      3         96 Timer
      3        864 WritableState
      4        448 BufferList
      4        448 CorkedRequest
      4       1152 ReadableState
      7        720 (ArrayBufferView)
      9        360 EventHandlers
     16        608 (anonymous)
     35        904 (Object)
     46       2944 NativeModule
    107       3424 (Array)
    607      32912 Object
 386396   18547008 MyRecord

(lldb) v8 findjsinstances ServerResponse
0x00001ab58c07df51:<Object: ServerResponse>

(lldb) v8 inspect 0x00001ab58c07df51
0x00001ab58c07df51:<Object: ServerResponse properties {
  .domain=0x00003a4b48e04101:<null>,
  ._events=0x00003d69f403cf49:<Object: EventHandlers>,
  ._eventsCount=<Smi: 1>,
  ._maxListeners=0x00003a4b48e04189:<undefined>,
  .output=0x00003d69f403cf69:<Array: length=0>,
  .outputEncodings=0x00003d69f403cf89:<Array: length=0>,
  .outputCallbacks=0x00003d69f403cfa9:<Array: length=0>,
  .outputSize=<Smi: 0>,
  .writable=0x00003a4b48e04231:<true>,
  ._last=0x00003a4b48e04299:<false>,
  .chunkedEncoding=0x00003a4b48e04299:<false>,
  .shouldKeepAlive=0x00003a4b48e04231:<true>,
  .useChunkedEncodingByDefault=0x00003a4b48e04231:<true>,
  .sendDate=0x00003a4b48e04231:<true>,
  ._removedHeader=0x00003d69f4046ff9:<Object: Object>,
  ._contentLength=0x00003a4b48e04101:<null>,
  ._hasBody=0x00003a4b48e04231:<true>,
  ._trailer=0x00003a4b48e042e1:<String: "">,
  .finished=0x00003a4b48e04299:<false>,
  ._headerSent=0x00003a4b48e04299:<false>,
  .socket=0x00003d69f403d109:<Object: Socket>,
  .connection=0x00003d69f403d109:<Object: Socket>,
  ._header=0x00003a4b48e04101:<null>,
  ._headers=0x00003a4b48e04101:<null>,
  ._headerNames=0x00003d69f4047031:<Object: Object>,
  ._onPendingData=0x00003d69f4047069:<function: updateOutgoingData at  _http_server.js:267:30>}>

@rnchamberlain
Copy link
Contributor

@indutny has now moved llnode to nodejs org here: https://github.com/nodejs/llnode

A table comparing commands available in the MDB and LLDB plugins is here:
https://developer.ibm.com/node/2016/09/27/advances-in-core-dump-debugging-for-node-js/

@richardlau
Copy link
Member

Closing due to inactivity. https://github.com/nodejs/llnode is a thing. If this needs to remain open please raise a new issue in https://github.com/nodejs/diagnostics.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants