Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: call gc() explicitly to avoid OOM #22301

Merged
merged 1 commit into from
Aug 14, 2018

Conversation

refack
Copy link
Contributor

@refack refack commented Aug 13, 2018

Refs: nodejs/reliability#12
Refs: #16354

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines

@refack refack added flaky-test Issues and PRs related to the tests with unstable failures on the CI. fast-track PRs that do not need to wait for 48 hours to land. labels Aug 13, 2018
@refack refack requested review from Trott and joyeecheung August 13, 2018 20:21
@nodejs-github-bot nodejs-github-bot added addons Issues and PRs related to native addons. test Issues and PRs related to the tests. labels Aug 13, 2018
@refack
Copy link
Contributor Author

refack commented Aug 13, 2018

@@ -35,10 +36,12 @@ common.expectsError(function() {
type: Error
});

// Free the memory early to avoid OOM
global.gc()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a FIXME here since it's not supposed to crash? This is just a quick fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

@refack refack force-pushed the less-OOM-for-freebsd-by-max-string branch from ca53596 to d970a8a Compare August 13, 2018 20:29
@Trott
Copy link
Member

Trott commented Aug 13, 2018

/ping @nodejs/v8 on "shouldn't setting to undefined cause garbage collection in this situation?" and maybe also the "why does this crash?" As @joyeecheung points out, this shouldn't crash, so something weird is up. Not sure if it's specific to FreeBSD or specific to something we've configured that happens to be on our FreeBSD machines in CI. (Or something else?) See nodejs/reliability#12 for a small bit of context.

maxString = undefined;
global.gc()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semicolons? ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh my 😳

@joyeecheung
Copy link
Member

joyeecheung commented Aug 13, 2018

I've started another stress test (1989 pending the current one 1988) with master for comparison

@refack
Copy link
Contributor Author

refack commented Aug 13, 2018

/CC @nodejs/testing some low hanging time optimization fruit:

18:21:53 + python tools/test.py -p tap --mode=release 'addons/stringbytes-external-exceed-max/test-stringbytes*'
18:21:54 TAP version 13
18:21:54 1..8
18:22:08 ok 1 addons/stringbytes-external-exceed-max/test-stringbytes-external-at-max
18:22:08   ---
18:22:08   duration_ms: 13.920
18:22:08   ...
18:23:09 ok 2 addons/stringbytes-external-exceed-max/test-stringbytes-external-exceed-max
18:23:09   ---
18:23:09   duration_ms: 60.658
18:23:09   ...
18:23:19 ok 3 addons/stringbytes-external-exceed-max/test-stringbytes-external-exceed-max-by-1-ascii
18:23:19   ---
18:23:19   duration_ms: 10.175
18:23:19   ...
18:23:44 ok 4 addons/stringbytes-external-exceed-max/test-stringbytes-external-exceed-max-by-1-base64
18:23:44   ---
18:23:44   duration_ms: 24.288
18:23:44   ...
18:24:17 ok 5 addons/stringbytes-external-exceed-max/test-stringbytes-external-exceed-max-by-1-binary
18:24:17   ---
18:24:17   duration_ms: 33.149
18:24:17   ...
18:25:06 ok 6 addons/stringbytes-external-exceed-max/test-stringbytes-external-exceed-max-by-1-hex
18:25:06   ---
18:25:06   duration_ms: 49.453
18:25:06   ...
18:25:07 ok 7 addons/stringbytes-external-exceed-max/test-stringbytes-external-exceed-max-by-1-utf8
18:25:07   ---
18:25:07   duration_ms: 0.399
18:25:07   ...
18:25:21 ok 8 addons/stringbytes-external-exceed-max/test-stringbytes-external-exceed-max-by-2
18:25:21   ---
18:25:21   duration_ms: 13.908
18:25:21   ...
18:25:21 + OK=21
18:25:21 + echo '21   OK: 21   NOT OK: 0   TOTAL: 100'
18:25:21 21   OK: 21   NOT OK: 0   TOTAL: 100

@devsnek
Copy link
Member

devsnek commented Aug 13, 2018

@Trott

shouldn't setting to undefined cause garbage collection in this situation?

v8 has a fairly lazy gc by default. setting a variable to undefined clears the reference to the value but the value might not be deleted until the next gc, and even then it might be removed from v8's heap but v8's heap might not shrink. there is a new method in v8 called EnableMemorySavingsMode which might help with this.

why does this crash?

v8 will allocate the second and third strings without deleting the previous ones. as far as i know v8 doesn't track how much memory the system actually has. because of this it has a method called LowMemoryNotification and another one called MemoryPressureNotification which enable the embedder to tell v8 that there isn't any more memory for it to use.


someone actually brought up on irc yesterday this topic about having more control over how node (and by extension v8) is using memory. i'm wondering if maybe we could work on exposing some options around that to applications.

@joyeecheung
Copy link
Member

joyeecheung commented Aug 13, 2018

@devsnek ExternString::NewFromCopy is supposed to return an empty maybe when there is not enough memory so that an error can be thrown later, v8 failing to GC the previous strings are not supposed to cause crashes..

@joyeecheung
Copy link
Member

By the way I can't seem to reproduce this in a new FreeBSD box, maybe it has something to do with the resource usage on our CI machines and you'd need to hit a certain threshold to get the crash. @refack have you managed to get a core dump?

@devsnek
Copy link
Member

devsnek commented Aug 13, 2018

@joyeecheung i meant that if v8 tries to allocate and hits OOM it will abort. there might also be a callback to catch that. i don't know what it does if it fails to deallocate.

@joyeecheung
Copy link
Member

joyeecheung commented Aug 13, 2018

@devsnek The majority of the allocation is not supposed to happen in the V8 heap, since it's an external string...also nodejs/reliability#12 (comment) didn't give us the usual OOM message from V8, it's killed by the system so it's probably caused by some system call hitting the resource limit (whereas if the OOM happens in V8's heap V8 will print that OOM message before aborting). The actual V8 heap is only a few megabytes in https://www.irccloud.com/pastebin/FI3zgCwR/

@refack
Copy link
Contributor Author

refack commented Aug 14, 2018

CI: https://ci.nodejs.org/job/node-test-pull-request/16431/
If it's green, I'm gonna land (since this unbreaks CI)

@refack
Copy link
Contributor Author

refack commented Aug 14, 2018

I've started another stress test (1989 pending the current one 1988) with master for comparison

Stress test against master crashed after 5 repeats:

echo '5   OK: 1   NOT OK: 4   TOTAL: 100'

But I think that is a significant enough result.

@refack refack added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Aug 14, 2018
@Trott
Copy link
Member

Trott commented Aug 14, 2018

👍 here to fast-track.

@Trott
Copy link
Member

Trott commented Aug 14, 2018

@Trott
Copy link
Member

Trott commented Aug 14, 2018

Although it looks like https://ci.nodejs.org/job/node-test-pull-request/16432/ is a Resume Build that's already green so mine is unnecessary. This can land once we get a second fast-track approval. Someone? Anyone?

@gireeshpunathil
Copy link
Member

setting a variable to undefined clears the reference to the value but the value might not be deleted until the next gc, and even then it might be removed from v8's heap but v8's heap might not shrink.

@devsnek - does this mean that (large and complex) applications should explicitly invoke gc() to avoid garbage pileup? that is in direct contradiction with the automatic garbage collection promise. I would assume gc() to abort only after it has cleared up even the last byte of garbage and still cannot allocate the current request.

To me, this looks like a v8 bug.

@hashseed
Copy link
Member

@ulan @hannespayer I'm also a bit surprised that you need to explicitly call GC to avoid OOM.

@devsnek
Copy link
Member

devsnek commented Aug 14, 2018

@gireeshpunathil

does this mean that (large and complex) applications should explicitly invoke gc() to avoid garbage pileup? ... I would assume gc() to abort only after it has cleared up even the last byte of garbage and still cannot allocate the current request.

that is how v8 behaves (https://github.com/v8/v8/blob/3f0346cdedb230a89dc433ade7e46ac7129ba5e4/src/heap/heap.cc#L4608-L4648)

there was some missing context here... this test allocates external strings, not v8 heap strings, so from what i understand the failure is actually a bug with node, not v8. this fact, of course, plugs into hashseed's comment.

PR-URL: nodejs#22301
Refs: nodejs/reliability#12
Refs: nodejs#16354
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
Reviewed-By: Gus Caplan <[email protected]>
@refack refack force-pushed the less-OOM-for-freebsd-by-max-string branch from 4757c91 to 4ce744a Compare August 14, 2018 06:29
@refack
Copy link
Contributor Author

refack commented Aug 14, 2018

Landed in 4ce744a

@refack refack merged commit 4ce744a into nodejs:master Aug 14, 2018
@refack refack deleted the less-OOM-for-freebsd-by-max-string branch August 14, 2018 06:30
@joyeecheung
Copy link
Member

joyeecheung commented Aug 14, 2018

Looks like the test is still crashing https://ci.nodejs.org/job/node-test-commit-freebsd/19642/nodes=freebsd11-x64/console

EDIT: that CI was not rebased against this commit

@mlippautz
Copy link

tl;dr: This should be fixed by V8 soon.

V8 relies on embedders calling AdjustAmountOfExternalAllocatedMemory for memory that externally allocated. This is error prone and the API does not follow any growing strategy for allocated memory, making it likely that it is not doing not enough or too many GCs.

We know that this is an issue and there's ongoing work that is about to be landed on master handling types where V8 has more knowledge (externalized strings & array buffers).

Specifically, the external string issue should be fixed as soon as that CL lands.

rvagg pushed a commit that referenced this pull request Aug 15, 2018
PR-URL: #22301
Refs: nodejs/reliability#12
Refs: #16354
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
Reviewed-By: Gus Caplan <[email protected]>
@rubys rubys mentioned this pull request Aug 21, 2018
2 tasks
MylesBorins pushed a commit that referenced this pull request Oct 2, 2018
PR-URL: #22301
Refs: nodejs/reliability#12
Refs: #16354
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
Reviewed-By: Gus Caplan <[email protected]>
@BethGriggs BethGriggs mentioned this pull request Oct 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addons Issues and PRs related to native addons. author ready PRs that have at least one approval, no pending requests for changes, and a CI started. fast-track PRs that do not need to wait for 48 hours to land. flaky-test Issues and PRs related to the tests with unstable failures on the CI. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.