-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate flaky test-fs-readfile-tostring-fail #16601
Comments
I think better is
My opinion does not solve the issue but I think that it will provide useful information at next time the same problem occurs. |
@Leko If the error is not an |
@joyeecheung Ah, I see. It’s just nothing. |
This seems to be failing reasonably often again. Anyone have any ideas? https://ci.nodejs.org/job/node-test-commit-osx/16147/nodes=osx1010/tapResults/ |
This does not only fail on OS-X as it seems: https://ci.nodejs.org/job/node-test-commit-linux/16441/nodes=ubuntu1404-64/console |
Easily reproduced with adjusting ulimits: #ulimit -f 10000000
with this patch it shows the error was null - evidently the write failed, so the read succeeded. --- a/test/sequential/test-fs-readfile-tostring-fail.js
+++ b/test/sequential/test-fs-readfile-tostring-fail.js
@@ -31,6 +31,7 @@ for (let i = 0; i < 201; i++) {
stream.end();
stream.on('finish', common.mustCall(function() {
fs.readFile(file, 'utf8', common.mustCall(function(err, buf) {
+ console.log(err)
assert.ok(err instanceof Error); #./node test/sequential/test-fs-readfile-tostring-fail.js
So I am not claiming that th CI had I guess the test should validate that |
Inviting interested parties to come up with a PR - I know the issue and can provide pointers. |
Since the most recent reported failure here is February, I'll mention that it happened again today: https://ci.nodejs.org/job/node-test-commit-osx/18712/nodes=osx1010/console not ok 2198 sequential/test-fs-readfile-tostring-fail
---
duration_ms: 0.642
severity: fail
exitcode: 7
stack: |-
/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1010/test/sequential/test-fs-readfile-tostring-fail.js:67
throw err;
^
AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:
assert.ok(err instanceof Error)
at /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1010/test/sequential/test-fs-readfile-tostring-fail.js:34:12
at /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1010/test/common/index.js:443:15
at FSReqWrap.readFileAfterClose [as oncomplete] (internal/fs/read_file_context.js:53:3)
... |
Mostly guessing, but maybe if (common.isAIX && (Number(cp.execSync('ulimit -f')) * 512) < kStringMaxLength) |
Failed on test-requireio-osx1010-x64-1: https://ci.nodejs.org/job/node-test-commit-osx/18735/nodes=osx1010/console not ok 2199 sequential/test-fs-readfile-tostring-fail
---
duration_ms: 0.264
severity: fail
exitcode: 7
stack: |-
/Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1010/test/sequential/test-fs-readfile-tostring-fail.js:67
throw err;
^
AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:
assert.ok(err instanceof Error)
at /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1010/test/sequential/test-fs-readfile-tostring-fail.js:34:12
at /Users/iojs/build/workspace/node-test-commit-osx/nodes/osx1010/test/common/index.js:443:15
at FSReqWrap.readFileAfterClose [as oncomplete] (internal/fs/read_file_context.js:53:3)
... |
Alas, that won't work. We just saw it fail on test-requireio-osx1010-x64-1 in CI and |
Stress test: https://ci.nodejs.org/job/node-stress-single-test/1855/nodes=osx1010/ Edit: Stress test was running on macstadium and seemed to be doing fine after 573-ish runs. Going to try again and hope I get a requireio machine this time for comparison. |
Even if not |
I was investigating this. Few points:
system trace showd that writev failed, but was never detected, retried, or propagated upwards, and was silently ignored:
live debugger showed the same, and we seem to be closing the file as if we wrote enough: Process 25275 resuming
Process 25275 stopped
* thread #6, stop reason = breakpoint 2.1
frame #0: 0x000000010094d1bd node`uv__fs_write(req=0x000000010250b148) at fs.c:727 [opt]
724
725 if (req->off < 0) {
726 if (req->nbufs == 1)
-> 727 r = write(req->file, req->bufs[0].base, req->bufs[0].len);
728 else
729 r = writev(req->file, (struct iovec*) req->bufs, req->nbufs);
730 } else {
Target 0: (node) stopped.
(lldb) n
(lldb) p r
(ssize_t) $10 = 5368708
(lldb) c
Process 25275 resuming
Process 25275 stopped
* thread #9, stop reason = breakpoint 3.1
frame #0: 0x000000010094d243 node`uv__fs_write(req=0x0000000103023478) at fs.c:729 [opt]
726 if (req->nbufs == 1)
727 r = write(req->file, req->bufs[0].base, req->bufs[0].len);
728 else
-> 729 r = writev(req->file, (struct iovec*) req->bufs, req->nbufs);
730 } else {
731 if (req->nbufs == 1) {
732 r = pwrite(req->file, req->bufs[0].base, req->bufs[0].len, req->off);
Target 0: (node) stopped.
(lldb) p r
(ssize_t) $11 = 45831292
(lldb) p req->file
(uv_file) $14 = 13
(lldb) c
Process 25302 resuming
Process 25302 stopped
* thread #10, stop reason = breakpoint 6.28 7.28
frame #0: 0x00007fff7b0ec4f8 libsystem_kernel.dylib`close
libsystem_kernel.dylib`close:
-> 0x7fff7b0ec4f8 <+0>: movl $0x2000006, %eax ; imm = 0x2000006
0x7fff7b0ec4fd <+5>: movq %rcx, %r10
0x7fff7b0ec500 <+8>: syscall
0x7fff7b0ec502 <+10>: jae 0x7fff7b0ec50c ; <+20>
Target 0: (node) stopped.
(lldb) f 1
frame #1: 0x000000010094b310 node`uv__fs_work(w=<unavailable>) at fs.c:1113 [opt]
1110 X(ACCESS, access(req->path, req->flags));
1111 X(CHMOD, chmod(req->path, req->mode));
1112 X(CHOWN, chown(req->path, req->uid, req->gid));
-> 1113 X(CLOSE, close(req->file));
1114 X(COPYFILE, uv__fs_copyfile(req));
1115 X(FCHMOD, fchmod(req->file, req->mode));
1116 X(FCHOWN, fchown(req->file, req->uid, req->gid));
(lldb) p req->file
error: Couldn't materialize: couldn't get the value of variable req: no location, value may have been optimized out
error: errored out in DoExecute, couldn't PrepareToExecuteJITExpression
(lldb) reg read rdi
rdi = 0x000000000000000d
(lldb) c
Process 25275 resuming
/Users/gireeshpunathil/Desktop/collab/node/test/sequential/test-fs-readfile-tostring-fail.js:67
throw err; So this would mean we shoud:
/cc @nodejs/libuv |
@gireeshpunathil If it's an issue with partial writes, can you check if libuv/libuv#1742 fixes the issue? |
o! 26084/0x1be13b: writev(0xA, 0x103048800, 0xC8) = 45831292 0
26084/0x1be124: kevent(0x3, 0x7FFEEFBF70B0, 0x0) = -1 Err#4
26084/0x1be13b: writev(0xA, 0x103048880, 0xC0) = -1 Err#27 the error is propagated, and the write is re-attempted, and finally it is thrown properly too: In disc-near-full case the error can be different, but we won't reach the scenario where we are currently. thanks @santigimeno ! So I guess we just have to mark this as flaky, wait for libuv#1742 to land, and for Node to consume it! |
Nice. Let's see if we can finally move forward with the review of the PR. |
Test was introduced in b620790 by @evanlucas. PR was #3485 and it was to fix a bug reported in #2767. |
ok, so looks like |
https://ci.nodejs.org/job/node-test-commit-linux/26604/nodes=ubuntu1804-64/console test-joyent-ubuntu1804-x64-1 00:26:47 not ok 2460 sequential/test-fs-readfile-tostring-fail
00:26:47 ---
00:26:47 duration_ms: 26.401
00:26:47 severity: fail
00:26:47 exitcode: 7
00:26:47 stack: |-
00:26:47 /home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1804-64/test/sequential/test-fs-readfile-tostring-fail.js:67
00:26:47 throw err;
00:26:47 ^
00:26:47
00:26:47 AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:
00:26:47
00:26:47 assert.ok(err instanceof Error)
00:26:47
00:26:47 at /home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1804-64/test/sequential/test-fs-readfile-tostring-fail.js:34:12
00:26:47 at /home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1804-64/test/common/index.js:369:15
00:26:47 at FSReqCallback.readFileAfterClose [as oncomplete] (internal/fs/read_file_context.js:54:3)
00:26:47 ... |
It may be a slight cheat to get the issue resolved, but I wonder if given that it deals with a 1Gb file, it should be moved to |
https://ci.nodejs.org/job/node-test-commit-linux/26616/nodes=ubuntu1804-64/console test-joyent-ubuntu1804-x64-1 17:26:21 not ok 2460 sequential/test-fs-readfile-tostring-fail
17:26:21 ---
17:26:21 duration_ms: 27.288
17:26:21 severity: fail
17:26:21 exitcode: 7
17:26:21 stack: |-
17:26:21 /home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1804-64/test/sequential/test-fs-readfile-tostring-fail.js:67
17:26:21 throw err;
17:26:21 ^
17:26:21
17:26:21 AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:
17:26:21
17:26:21 assert.ok(err instanceof Error)
17:26:21
17:26:21 at /home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1804-64/test/sequential/test-fs-readfile-tostring-fail.js:34:12
17:26:21 at /home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1804-64/test/common/index.js:369:15
17:26:21 at FSReqCallback.readFileAfterClose [as oncomplete] (internal/fs/read_file_context.js:54:3)
17:26:21 ... |
@Trott - I don't know if this is feasible, but how about:
and assert that the key here I believe to be able to use the same buffer for iterative file read operation. |
test-digitalocean-ubuntu1804-x64-1 is passing consistently, but test-joyent-ubuntu1804-x64-1 is failing consistently. |
The test is failing on that host because it has less than 1Gb of free disk space, so the file gets truncated and the error does not occur when the file is read. I think moving to |
I'm removing workspaces and will put it back online, then open a pull request to move this test to pummel. |
I'd expect the test to detect that -- Is it ignoring errors when writing the file out? |
const stream = fs.createWriteStream(file, {
flags: 'a'
});
const size = kStringMaxLength / 200;
const a = Buffer.alloc(size, 'a');
for (let i = 0; i < 201; i++) {
stream.write(a);
}
stream.end(); |
I'd expect that to throw if there's a problem and it does indeed when I mess with file permissions to cause a problem. There's also this, but that seems like it shouldn't get in the way either: function destroy() {
try {
fs.unlinkSync(file);
} catch {
// it may not exist
}
}
...
process.on('uncaughtException', function(err) {
destroy();
throw err;
}); |
Could be an OS-specific and/or file-system-specific and/or configuration-specific thing so someone may need to log in again to figure out why it's not throwing an error if it's a mystery. |
If it's a stream should it be listening for the https://nodejs.org/api/stream.html#stream_writable_write_chunk_encoding_callback
|
Interesting observation: the recent 19 failures all happened on test-joyent-ubuntu1804-x64-1
Example
|
It would be interesting to know what kind of value |
Fixes: nodejs#16601 PR-URL: nodejs#27053 Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Yongsheng Zhang <[email protected]> Reviewed-By: Refael Ackermann <[email protected]>
worker config is not too shabby (maybe a bit of a small disk) ubuntu@test-joyent-ubuntu1804-x64-1:~$ free -h
total used free shared buff/cache available
Mem: 3.6G 281M 2.2G 388K 1.2G 3.1G
Swap: 1.9G 12M 1.9G
ubuntu@test-joyent-ubuntu1804-x64-1:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 1.8G 0 1.8G 0% /dev
tmpfs 370M 672K 369M 1% /run
/dev/vda1 7.3G 6.2G 1.1G 85% /
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/vda15 105M 3.4M 102M 4% /boot/efi
/dev/vdb 98G 61M 93G 1% /mnt
tmpfs 370M 0 370M 0% /run/user/1000 Should we upgrade the host, or keep it as a canary? |
Maybe use it to see if #27058 gives better diagnostics when it fails? |
Fixes: #16601 PR-URL: #27053 Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Yongsheng Zhang <[email protected]> Reviewed-By: Refael Ackermann <[email protected]>
Fixes: #16601 PR-URL: #27053 Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Yongsheng Zhang <[email protected]> Reviewed-By: Refael Ackermann <[email protected]> Signed-off-by: Beth Griggs <[email protected]>
Fixes: #16601 PR-URL: #27053 Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Yongsheng Zhang <[email protected]> Reviewed-By: Refael Ackermann <[email protected]> Signed-off-by: Beth Griggs <[email protected]>
Fixes: #16601 PR-URL: #27053 Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Yongsheng Zhang <[email protected]> Reviewed-By: Refael Ackermann <[email protected]> Signed-off-by: Beth Griggs <[email protected]>
https://ci.nodejs.org/job/node-test-commit-osx/13607/nodes=osx1010/console
The text was updated successfully, but these errors were encountered: