-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: fix flaky test-cluster-shared-leak.js #4173
Conversation
One thing that needs to be checked is that the test with this change still fails on Windows with Node 4.2.1. (The test checks for a bug in Node 4.2.1.) |
@@ -15,6 +15,11 @@ if (cluster.isMaster) { | |||
worker1 = cluster.fork(); | |||
worker1.on('message', common.mustCall(function() { | |||
worker2 = cluster.fork(); | |||
worker2.on('error', function(e) { | |||
// EPIPE is OK on Windows | |||
if ((! common.isWindows) || (e.code !== 'EPIPE')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from some style issues, would an opposite check be more clear? i.e. if (common.isWindows && e.code === 'EPIPE') return;
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to write it that way but I went with this to more closely mirror the (simpler) condition involving ECONNRESET
later in the file. You're right, though, and I'll switch it.
@romankl No problem. The test should be marked flaky until this (or some other fix) lands, so that's totally fine. I'll rebase this and add the removal from the .status file. Thanks! |
Fixed up per nits from @romankl and @Fishrock123 |
@@ -15,14 +15,21 @@ if (cluster.isMaster) { | |||
worker1 = cluster.fork(); | |||
worker1.on('message', common.mustCall(function() { | |||
worker2 = cluster.fork(); | |||
worker2.on('error', function(e) { | |||
// EPIPE is OK on Windows | |||
if (common.isWindows && e.code === 'EPIPE') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too surprised it only happens on Windows but I don't understand why you only need to the check for the second worker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish I had a logical explanation for that too. I've been poking at it a bit, but I'm slowed by not having direct access to a Windows machine. I've been adding logging statements and running on CI which is obviously a limiting and slow process...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I figured out why it's only afflicting worker2
and how to rewrite the code without the special handling for Windows.
worker2
gets EPIPE
if we try to send to it before it's actually listening. This never happens with worker1
because by the time we get to worker1.send()
, we're in the worker1
message handler, so it's obviously already listening. But there's no such protection for worker2
. So the fix is to wait for the listening
event to fire on cluster
for worker2
before doing all the send()
stuff. Running a stress test right now to confirm that's really the fix. Looks good so far. Will update this PR if that doesn't change. (Already confirmed that it still fails for Node 4.2.1, which is what we want.)
test-cluster-shared-leak.js was flaky because a worker can emit EPIPE. Wait for workers to be listening so that EPIPE does not happen. Fixes: nodejs#3956 PR-URL: nodejs#4173
OK, fixed up hopefully for really-realz this time. Stress test: https://ci.nodejs.org/job/node-stress-single-test/150/nodes=win2012r2/console |
sigh CI is good but stress test is not. Back to the drawing board... |
Reverted to the previous workaround. |
Too bad the wait-until-listening approach didn't work out. Doesn't the current approach of calling |
Yes, all routes so far are terrible, but I'll figure something out. |
Right now, I'm focusing on an assertion that can be fired in |
Here's the bug that I think/hope might be related to the issues here: #4205 |
Fix for #4205 has landed, so time to resume work on this issue... Stress test with current master to confirm that this bug still exists: |
Refactor test-cluster-shared-leak.js to remove flakiness on Windows. Fixes: nodejs#3956 PR-URL: nodejs#4173
OK, current minimal fix works. CI: https://ci.nodejs.org/job/node-test-commit/1555/ Stress test: https://ci.nodejs.org/job/node-stress-single-test/233/nodes=win2012r2/consoleFull Here's the explanation: There is no guarantee that the pipe will be there when PTAL @bnoordhuis and anyone else interested. I'd really like to get rid of this flaky test! (And pre-emptive strike: Yes, I or someone else should definitely squash this down to one commit before landing. It's been a journey. EDIT: Red failures on Windows are a known flaky test (that I already have a PR open for to mark as flaky which I'll land momentarily) and a newly flaky but unrelated test. Hooray. :-| I'll look into it, but it shouldn't stop this from landing... |
One more (hopefully last) time with feeling: CI: https://ci.nodejs.org/job/node-test-commit/1558/ (And the test still fails in Node 4.2.1, which is the last release that had the bug, so that's good.) |
CI etc. looks good. Still needs an LGTM, though. /cc @nodejs/testing One last cut-and-paste of the explanation:
|
Swallow EPIPE as there is it is expected to come up from time to time. This does not invalidate the test. Fixes: nodejs#3956 PR-URL: nodejs#4173
Closing in favor of #4510 |
test-cluster-shared-leak.js was flaky because a worker can emit EPIPE.
This error event is expected.
Fixes: #3956