Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancerous memory consumption under specific circumstances #29001

Closed
niieani opened this issue Aug 6, 2019 · 4 comments
Closed

Cancerous memory consumption under specific circumstances #29001

niieani opened this issue Aug 6, 2019 · 4 comments
Labels
v8 engine Issues and PRs related to the V8 dependency.

Comments

@niieani
Copy link

niieani commented Aug 6, 2019

  • Version: all versions from 12.5.0 affected, also occurs on 12.7.0
  • Platform: tested on macOS (18.7.0 Darwin xnu-4903.270.47~4/RELEASE_X86_64) and Linux
  • Subsystem: something low-level

This is one of the weirdest bugs I've ever debugged.

Under certain circumstances, Node process (starting from version 12.5.0, previous versions not affected), will not exit (even if we call process.exit() at the end of the file), and will start to consume memory until it fills the whole available memory, using 100% CPU while at it.

Here's a reproduction:

https://gist.github.com/niieani/a527e9b2e6caf28e3021a50b938e4a91

Now, what's really funny, there are a few things that make this reproduction not trigger the bug:

  • making the strings in the reproduction.json shorter makes the process exit normally
  • removing various lines from the reproduction (comments show where) makes the process exit normally
  • replacing path.resolve with a call to a different function, e.g. by console.log (note that this originally used jest-haste-map/build/lib/fast_path and would still trigger the bug)

Standard memory or CPU profiling doesn't show anything (literally, there are no function calls shown in the profiler). Dumping the heap snapshot results in a 36MB file, even when the memory consumption of the process is 10 GB.

The backstory is that in our case jest would not end AFTER all the tests have completed, and instead would start consuming huge amounts of memory (on CI, multiple processes from separate PRs would deplete the 128GB of RAM on the machine) so I started going down the rabbit hole to try and pinpoint what is the cause of the issue, which lead me to this minimal reproduction scenario.

I've done a core dump, so here's what I got from lldb from a slightly larger (earlier version) of my reproduction:

core dump (lldb)
(lldb) bt all
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x00007fff5bddd86a libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff5be9c56e libsystem_pthread.dylib`_pthread_cond_wait + 722
    frame #2: 0x0000000100682ae6 node`uv_cond_wait + 9
    frame #3: 0x00000001000cbd19 node`node::TaskQueue<v8::Task>::BlockingDrain() + 39
    frame #4: 0x00000001000ccf6b node`node::NodePlatform::DrainTasks(v8::Isolate*) + 47
    frame #5: 0x00000001000af530 node`node::NodeMainInstance::Run() + 394
    frame #6: 0x000000010005a28a node`node::Start(int, char**) + 245
    frame #7: 0x00007fff5bca53d5 libdyld.dylib`start + 1
    frame #8: 0x00007fff5bca53d5 libdyld.dylib`start + 1
  thread #2, stop reason = signal SIGSTOP
    frame #0: 0x00007fff5bde078e libsystem_kernel.dylib`kevent + 10
    frame #1: 0x00000001006872c2 node`uv__io_poll + 841
    frame #2: 0x0000000100677c27 node`uv_run + 320
    frame #3: 0x00000001000cd437 node`node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Run() + 305
    frame #4: 0x00007fff5be992eb libsystem_pthread.dylib`_pthread_body + 126
    frame #5: 0x00007fff5be9c249 libsystem_pthread.dylib`_pthread_start + 66
    frame #6: 0x00007fff5be9840d libsystem_pthread.dylib`thread_start + 13
  thread #3, stop reason = signal SIGSTOP
    frame #0: 0x0000000100b17bc7 node`v8::internal::compiler::TypeNarrowingReducer::Reduce(v8::internal::compiler::Node*) + 377
    frame #1: 0x0000000100ad2fe8 node`v8::internal::compiler::(anonymous namespace)::SourcePositionWrapper::Reduce(v8::internal::compiler::Node*) + 60
    frame #2: 0x0000000100a349b9 node`v8::internal::compiler::GraphReducer::Reduce(v8::internal::compiler::Node*) + 83
    frame #3: 0x0000000100a347af node`v8::internal::compiler::GraphReducer::ReduceTop() + 293
    frame #4: 0x0000000100a345bb node`v8::internal::compiler::GraphReducer::ReduceNode(v8::internal::compiler::Node*) + 83
    frame #5: 0x0000000100ad4a85 node`v8::internal::compiler::LoadEliminationPhase::Run(v8::internal::compiler::PipelineData*, v8::internal::Zone*) + 569
    frame #6: 0x0000000100acc567 node`void v8::internal::compiler::PipelineImpl::Run<v8::internal::compiler::LoadEliminationPhase>() + 121
    frame #7: 0x0000000100aca405 node`v8::internal::compiler::PipelineImpl::OptimizeGraph(v8::internal::compiler::Linkage*) + 219
    frame #8: 0x0000000100aca2bf node`v8::internal::compiler::PipelineCompilationJob::ExecuteJobImpl() + 381
    frame #9: 0x000000010021a580 node`v8::internal::OptimizedCompilationJob::ExecuteJob() + 30
    frame #10: 0x00000001002190fd node`v8::internal::OptimizingCompileDispatcher::CompileNext(v8::internal::OptimizedCompilationJob*) + 31
    frame #11: 0x0000000100219d95 node`v8::internal::OptimizingCompileDispatcher::CompileTask::RunInternal() + 395
    frame #12: 0x00000001000cbb30 node`node::(anonymous namespace)::PlatformWorkerThread(void*) + 301
    frame #13: 0x00007fff5be992eb libsystem_pthread.dylib`_pthread_body + 126
    frame #14: 0x00007fff5be9c249 libsystem_pthread.dylib`_pthread_start + 66
    frame #15: 0x00007fff5be9840d libsystem_pthread.dylib`thread_start + 13
  thread #4, stop reason = signal SIGSTOP
    frame #0: 0x00007fff5bddd86a libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff5be9c56e libsystem_pthread.dylib`_pthread_cond_wait + 722
    frame #2: 0x0000000100682ae6 node`uv_cond_wait + 9
    frame #3: 0x00000001000cd526 node`node::TaskQueue<v8::Task>::BlockingPop() + 62
    frame #4: 0x00000001000cbb56 node`node::(anonymous namespace)::PlatformWorkerThread(void*) + 339
    frame #5: 0x00007fff5be992eb libsystem_pthread.dylib`_pthread_body + 126
    frame #6: 0x00007fff5be9c249 libsystem_pthread.dylib`_pthread_start + 66
    frame #7: 0x00007fff5be9840d libsystem_pthread.dylib`thread_start + 13
  thread #5, stop reason = signal SIGSTOP
    frame #0: 0x00007fff5bddd86a libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff5be9c56e libsystem_pthread.dylib`_pthread_cond_wait + 722
    frame #2: 0x0000000100682ae6 node`uv_cond_wait + 9
    frame #3: 0x00000001000cd526 node`node::TaskQueue<v8::Task>::BlockingPop() + 62
    frame #4: 0x00000001000cbb56 node`node::(anonymous namespace)::PlatformWorkerThread(void*) + 339
    frame #5: 0x00007fff5be992eb libsystem_pthread.dylib`_pthread_body + 126
    frame #6: 0x00007fff5be9c249 libsystem_pthread.dylib`_pthread_start + 66
    frame #7: 0x00007fff5be9840d libsystem_pthread.dylib`thread_start + 13
  thread #6, stop reason = signal SIGSTOP
    frame #0: 0x00007fff5bddd86a libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff5be9c56e libsystem_pthread.dylib`_pthread_cond_wait + 722
    frame #2: 0x0000000100682ae6 node`uv_cond_wait + 9
    frame #3: 0x00000001000cd526 node`node::TaskQueue<v8::Task>::BlockingPop() + 62
    frame #4: 0x00000001000cbb56 node`node::(anonymous namespace)::PlatformWorkerThread(void*) + 339
    frame #5: 0x00007fff5be992eb libsystem_pthread.dylib`_pthread_body + 126
    frame #6: 0x00007fff5be9c249 libsystem_pthread.dylib`_pthread_start + 66
    frame #7: 0x00007fff5be9840d libsystem_pthread.dylib`thread_start + 13
  thread #7, stop reason = signal SIGSTOP
    frame #0: 0x00007fff5bdda266 libsystem_kernel.dylib`semaphore_wait_trap + 10
    frame #1: 0x0000000100682fb5 node`uv_sem_wait + 16
    frame #2: 0x000000010010e006 node`node::inspector::(anonymous namespace)::StartIoThreadMain(void*) + 18
    frame #3: 0x00007fff5be992eb libsystem_pthread.dylib`_pthread_body + 126
    frame #4: 0x00007fff5be9c249 libsystem_pthread.dylib`_pthread_start + 66
    frame #5: 0x00007fff5be9840d libsystem_pthread.dylib`thread_start + 13

I couldn't get llnode to work, so this is all I could get from the above.

The integrated report also wasn't very helpful, but here's the relevant part (report from an even earlier reproduction while running in jest):

Node Report
{
  "javascriptStack": {
    "message": "No stack.",
    "stack": [
      "Unavailable."
    ]
  },
  "nativeStack": [
    {
      "pc": "0x0000000100147402",
      "symbol": "report::TriggerNodeReport(v8::Isolate*, node::Environment*, char const*, char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, v8::Local<v8::String>) [/usr/local/bin/node]"
    },
    {
      "pc": "0x000000010014df5c",
      "symbol": "report::WriteReport(v8::FunctionCallbackInfo<v8::Value> const&) [/usr/local/bin/node]"
    },
    {
      "pc": "0x00000001001cf90c",
      "symbol": "v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo) [/usr/local/bin/node]"
    },
    {
      "pc": "0x00000001001cefd6",
      "symbol": "v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) [/usr/local/bin/node]"
    },
    {
      "pc": "0x00000001001ce7be",
      "symbol": "v8::internal::Builtin_Impl_HandleApiCall(v8::internal::BuiltinArguments, v8::internal::Isolate*) [/usr/local/bin/node]"
    },
    {
      "pc": "0x000000010071ccd9",
      "symbol": "Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit [/usr/local/bin/node]"
    }
  ],
  "javascriptHeap": {
    "totalMemory": 43360256,
    "totalCommittedMemory": 41433760,
    "usedMemory": 38031208,
    "availableMemory": 2160634808,
    "memoryLimit": 2197815296,
    "heapSpaces": {
      "read_only_space": {
        "memorySize": 262144,
        "committedMemory": 32272,
        "capacity": 261832,
        "used": 31960,
        "available": 229872
      },
      "new_space": {
        "memorySize": 1048576,
        "committedMemory": 4520,
        "capacity": 1047328,
        "used": 3272,
        "available": 1044056
      },
      "old_space": {
        "memorySize": 35008512,
        "committedMemory": 34752088,
        "capacity": 34943712,
        "used": 32316584,
        "available": 2627128
      },
      "code_space": {
        "memorySize": 1212416,
        "committedMemory": 1072384,
        "capacity": 1150976,
        "used": 909120,
        "available": 241856
      },
      "map_space": {
        "memorySize": 2363392,
        "committedMemory": 2107280,
        "capacity": 2359128,
        "used": 1369600,
        "available": 989528
      },
      "large_object_space": {
        "memorySize": 3416064,
        "committedMemory": 3416064,
        "capacity": 3397216,
        "used": 3397216,
        "available": 0
      },
      "code_large_object_space": {
        "memorySize": 49152,
        "committedMemory": 49152,
        "capacity": 3456,
        "used": 3456,
        "available": 0
      },
      "new_large_object_space": {
        "memorySize": 0,
        "committedMemory": 0,
        "capacity": 1047328,
        "used": 0,
        "available": 1047328
      }
    }
  },
  "resourceUsage": {
    "userCpuSeconds": 17.6539,
    "kernelCpuSeconds": 1.12431,
    "cpuConsumptionPercent": 98.8324,
    "maxRss": 1796776919040,
    "pageFaults": {
      "IORequired": 2,
      "IONotRequired": 474408
    },
    "fsActivity": {
      "reads": 0,
      "writes": 0
    }
  },
  "libuv": [
    {
      "type": "async",
      "is_active": true,
      "is_referenced": false,
      "address": "0x0000000104102f70"
    },
    {
      "type": "timer",
      "is_active": true,
      "is_referenced": true,
      "address": "0x0000000105843328",
      "repeat": 0,
      "firesInMsFromNow": 616,
      "expired": false
    },
    {
      "type": "check",
      "is_active": true,
      "is_referenced": false,
      "address": "0x00000001058433c0"
    },
    {
      "type": "idle",
      "is_active": false,
      "is_referenced": true,
      "address": "0x0000000105843438"
    },
    {
      "type": "prepare",
      "is_active": false,
      "is_referenced": false,
      "address": "0x00000001058434b0"
    },
    {
      "type": "check",
      "is_active": false,
      "is_referenced": false,
      "address": "0x0000000105843528"
    },
    {
      "type": "async",
      "is_active": true,
      "is_referenced": false,
      "address": "0x000000010422b840"
    },
    {
      "type": "async",
      "is_active": true,
      "is_referenced": false,
      "address": "0x0000000101128958"
    },
    {
      "type": "signal",
      "is_active": true,
      "is_referenced": false,
      "address": "0x0000000103f05970",
      "signum": 31,
      "signal": "SIGUSR2"
    },
    {
      "type": "tty",
      "is_active": false,
      "is_referenced": true,
      "address": "0x0000000104231848",
      "width": 178,
      "height": 35,
      "fd": 20,
      "writeQueueSize": 0,
      "readable": true,
      "writable": true
    },
    {
      "type": "signal",
      "is_active": true,
      "is_referenced": false,
      "address": "0x0000000104303240",
      "signum": 28,
      "signal": "SIGWINCH"
    },
    {
      "type": "tty",
      "is_active": false,
      "is_referenced": true,
      "address": "0x0000000104305568",
      "width": 178,
      "height": 35,
      "fd": 22,
      "writeQueueSize": 0,
      "readable": true,
      "writable": true
    },
    {
      "type": "loop",
      "is_active": true,
      "address": "0x000000010112e690"
    }
  ]
}
niieani added a commit to niieani/jest that referenced this issue Aug 6, 2019
This is a workaround for an issue that occurs in node 12.5.0 and newer, described in detail here:  nodejs/node#29001

The behavior of the code in question is the same, however the change is enough to prevent locking the process into a "cancerous" state of memory growth.

This is one of the weirdest bugs I've ever debugged.

Under certain circumstances, Node process (starting from version 12.5.0, previous versions not affected), would not exit (even if we call `process.exit()` at the end of the file), and instead start to consume memory until it fills the whole available memory, using 100% CPU while at it.

Here's a reproduction based on the `HasteMap`:

https://gist.github.com/niieani/a527e9b2e6caf28e3021a50b938e4a91

Standard memory or CPU profiling doesn't show anything (literally, there are no function calls shown in the profiler). Dumping the heap snapshot results in a 36MB file, even when the memory consumption of the process is 10 GB.

The backstory is that in our case [`jest`](https://github.com/facebook/jest) would not end AFTER all the tests have completed, and instead would start consuming huge amounts of memory (on CI, multiple processes from separate PRs would deplete the 128GB of RAM on the machine) so I started going down the rabbit hole to try and pinpoint what is the cause of the issue, which lead me to this minimal reproduction scenario, but also to discover all the quirks that helped to avoid triggering the scenario.
@addaleax addaleax added the v8 engine Issues and PRs related to the V8 dependency. label Aug 6, 2019
@addaleax
Copy link
Member

addaleax commented Aug 6, 2019

#28955 fixes this for me.

@niieani
Copy link
Author

niieani commented Aug 7, 2019

Great to know we have a fix, but it would be useful to understand what was causing the issue in the first place, before closing.

Otherwise we cannot know if the issue simply shifted around and isn't triggered by something slightly different (given how delicate reproducing this bug is).

@addaleax
Copy link
Member

addaleax commented Aug 7, 2019

@niieani I don’t really have an answer for that – maybe somebody in @nodejs/v8 happens to know more. It’s a good sign that this is almost surely a bug in V8 and is fixed by a V8 upgrade; in the worst case, one could bisect V8 to figure out which commit exactly fixed this.

@mikehardy
Copy link

For people looking at this from a "react-native start" is slow perspective, you can either use node <= v12.4.0, likely node >= 12.9.0 (assuming it gets the V8 sync #28955), or (new info) based on my testing just now you might try a workaround in jest-haste-map documented here with an attached patch-package compatible patch until jest-haste-map has the workaround integrated

jestjs/jest#8787 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v8 engine Issues and PRs related to the V8 dependency.
Projects
None yet
Development

No branches or pull requests

4 participants