-
Notifications
You must be signed in to change notification settings - Fork 49
Memory Fragmentation (WIP) #386
Comments
Loving these issues. Keep them coming if you can. Will be digging back in on the quic code early next week. |
jemalloc with narenas=1 shows none of the excessive growth (confirmed over 48hrs). That supports this being fragmentation over a leak. I'm not really sure what the next steps here are. |
We'll need to reproduce the issue. We're not doing much special around allocations here so it's going to take a bit to figure out and nail down. Just having as much information as possible on your test case and data you've collected would be a great start |
@jasnell do you have a generic server and client example that exercises the streams api (say opening a stream and reading and writing for example)? Perhaps leaving something like that running for a few days would be a good test. This may be just a case of glibc being derpy and fragmenting with smaller allocations. Normally with TCP based streams this wouldn't be seen but I'm imagining there are alot of small allocations for dealing with small signalling packets in QUIC. |
Yes, there are many small allocations and reallocs that occur frequently. That definitely could be the cause. We could look into making that more efficient, and maybe even use a slab allocator for much of it. Hmm. Ok, that gives me an idea where to start. Thank you |
@jasnell It might also be worth pushing for some malloc opts tuning in nodejs. Of course that would be a platform specific solution though (of course it may not even matter on Windows, who even knows how their malloc works). I did a quick search of the NodeJS issues and surprisingly found no discussions on it. Despite being mostly single threaded daemon 2-8x the number of cpu cores worth of arenas are in play. Even with IO threads this seems excessive and would likely contribute to fragmentation. I'd need to test whether just tuning malloc narenas would resolve the fragmentation. Glibc is extremely prone to fragmentation, jemalloc less so. A slab allocator however would be a great if the allocation patterns are suitable :) |
I'm fairly certain that it's the reallocs that We're doing here. I'm going to start there tomorrow and see where we get |
possible PR fix in #388 |
Additional information. jemalloc is not a solution to this problem introduces, not because it doesnt solve the fragmentation issues but because it introduces it's own compatibility problems with NodeJS. It appears NodeJS is prone to lockups when running with jemalloc. I've been seeing lockups on projects not using QUIC (however my nodejs process has the QUIC patches). Especially if narenas is reduced (e.g to 1). This doesnt appear to be specific to QUIC support however. I intend to do some testing on x86 from NodeSource builds to make sure it's not ARM specific or introduced by the patch and report it over on the Node project. Usually a lockup like that indicates destructor or memory access issues. Jemalloc tends to detect those via lockup. In the interim if it's of interest to you (or anyone else who comes across this issue). Lockups look like:
and
Although the behaviour looks similar to jemalloc/jemalloc#1392 (https://bugs.openjdk.java.net/browse/JDK-8215355) which appears to have been an issue with stack trace iteration (http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8215355/01/webrev/hotspot.patch)? Possibly an assumption that doesnt hold true with non glibc malloc. I'm not sure if V8 does anything similar. |
What steps will reproduce the bug?
Long running QUIC client making regular bidirectional streams in both directions
How often does it reproduce? Is there a required condition?
24-48hrs for maximum effect
What is the expected behavior?
Memory sitting around 20-30MB (max old size of 30MB). This is the usage of the process without the QUIC server components compiled in and using TCP sockets for the same protocol in place of QUIC.
What do you see instead?
Memory peaking at 100MB+ (OOM on test device)
Additional information
QUIC (most likely ngtcp2 allocated memory) seems to create a high rate of memory fragmentation when using the default glibc malloc in real world conditions. While 100MB of ram is likely not an issue in server applications in the low-end space this is significant. Additionally as this ram is allocated with an external allocator / not v8 it is not allocated as part of the nodejs memory pool it will grow unrestricted by parameters such as max old space size.
The usage appears to be fragmentation rather than a leak, however I have not entirely ruled that out.
Currently I'm testing using jemalloc to see if it exhibits more sane behavior (results to following in coming week). Turn around time on replication means that I'll be testing this over the coming week. If it jemalloc results in sane memory usage then that supports a fragmentation situation. So far with a runtime of 2hrs this seems to be supported.
The text was updated successfully, but these errors were encountered: