-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leak and very long hanging when fetch throws ConnectTimeoutError(UND_ERR_CONNECT_TIMEOUT)
#2362
Comments
ConnectTimeoutError(UND_ERR_CONNECT_TIMEOUT)
Thanks for the bug report. Sanity check: does it finish more quickly when you start node with |
No, it does not. With
Same results with |
cc @nodejs/undici |
What |
I don't think it related to garbage collection. Here I ignore the body as recommended in link you provided: "use strict";
const startedAt = Date.now();
console.log('started fetch at: ', new Date(startedAt));
try {
await fetch('https://jsonplaceholder.typicode.com/todos/1', { method: 'HEAD' });
} catch (error) {
console.log(error);
}
const finishedAt = Date.now();
console.log('finished fetch at: ', new Date(finishedAt))
console.log(`fetch took ${finishedAt - startedAt} ms`)
console.log('timestamp when script should finish: ', Date.now()) Here I added code to work with body as recommended in link you provided, but this code will never be reached: "use strict";
const startedAt = Date.now();
console.log('started fetch at: ', new Date(startedAt));
try {
const res = await fetch('https://jsonplaceholder.typicode.com/todos/1');
const data = await res.json();
console.log(data);
} catch (error) {
console.log(error);
}
const finishedAt = Date.now();
console.log('finished fetch at: ', new Date(finishedAt))
console.log(`fetch took ${finishedAt - startedAt} ms`)
console.log('timestamp when script should finish: ', Date.now()) |
It fails before I can get any access to any data at all. Fetch fails itself. It does not return anything. |
Manually calling garbage collector also does not change anything "use strict";
const startedAt = Date.now();
console.log('started fetch at: ', new Date(startedAt));
try {
await fetch('https://jsonplaceholder.typicode.com/todos/1');
} catch (error) {
console.log(error);
}
global.gc();
const finishedAt = Date.now();
console.log('finished fetch at: ', new Date(finishedAt))
console.log(`fetch took ${finishedAt - startedAt} ms`)
console.log('timestamp when script should finish: ', Date.now())
The problem is not in the fact that it throws error. The problem is that fetch after throwing an error keeps doing some stuff in the background. |
This comment has been minimized.
This comment has been minimized.
This comment was marked as resolved.
This comment was marked as resolved.
does this happen with undici.request? Also, is there a repro that doesn't require a VPN/disabling wifi? |
After I installed "use strict";
import { request } from 'undici';
const startedAt = Date.now();
console.log('started request at: ', new Date(startedAt));
try {
await request('https://jsonplaceholder.typicode.com/todos/1')
} catch (error) {
console.log(error);
}
const finishedAt = Date.now();
console.log('finished request at: ', new Date(finishedAt))
console.log(`request took ${finishedAt - startedAt} ms`)
console.log('timestamp when script should finish: ', Date.now())
I am not a network engineer unfortunately, so my chances to create reproducible environment are low, but I'll do my best with docker-compose. |
Could it be the |
I wasn't lucky recently so I was not able to create reproducible environment with docker which you can easily deploy on your machine. But I'm open to run any steps you'll ask me to debug this, or schedule a meeting to figure it out. |
You can give it a try to |
I was able to reproduce it consistently in QEMU VMs. Previously I disabled wi-fi to get this error, but even that is not needed now. I just broke credentials a bit in wireguard config and basically got the same effect. JS script in vm always throws the same error with or without internet on the host machine. By the way in VM there is QEMU VM approach
QEMU VM approach from scratch
To copy files between vm and host:
To fix internet (make internet work fine) in vm run |
@metcoder95 |
I don't know any other way to simulate such network environment. If we remove internet completely, fetch throws completely different error ( I don't see options to reproduce it on the server because it's kinda stupid to break internet to make tests and immediately lose access to the machine after that VM is a perfect candidate |
Looking at the output of
socket = tls.connect({
highWaterMark: 16384,
// TLS in node can't have bigger HWM anyway...
...options,
servername,
session,
localAddress,
// TODO(HTTP/2): Add support for h2c
ALPNProtocols: allowH2 ? ["http/1.1", "h2"] : ["http/1.1"],
socket: httpSocket,
// upgrade socket connection
port: port || 443,
host: hostname
});
[kInitializeHandle](timeout, tries) {
const { ChannelWrap } = lazyBinding();
this._handle = new ChannelWrap(timeout, tries);
} |
If you reduce the |
Increasing timeout here from 10000 ms to 45000 ms, produces following results: ~45s of waiting, returning control back to the code, process hanging for ~75 seconds. Same 120 seconds of waiting decreasing from 10000 to 5000, produces following results: ~5s of waiting, returning control back to the code, process hanging for ~115 seconds. Same 120 seconds of waiting. |
looks like the issue is much more interesting that it might seem. "use strict";
import { connect } from 'tls';
const startedAt = Date.now();
console.log('started fetch at: ', new Date(startedAt));
try {
const timeout = 2000; // ms
const asd = connect({
highWaterMark: 16384,
path: undefined,
servername: 'jsonplaceholder.typicode.com',
session: null,
localAddress: null,
ALPNProtocols: [ 'http/1.1' ],
socket: undefined,
port: 443,
host: 'jsonplaceholder.typicode.com'
});
asd.addListener('OCSPResponse', (...args) => console.log('OCSPResponse', ...args))
asd.addListener('keylog', (...args) => console.log('keylog', ...args))
asd.addListener('secureConnect', (...args) => console.log('secureConnect', ...args))
asd.addListener('session', (...args) => console.log('session', ...args))
asd.addListener('close', (...args) => console.log('close', ...args))
asd.addListener('connect', (...args) => console.log('connect', ...args))
asd.addListener('data', (...args) => console.log('data', ...args))
asd.addListener('drain', (...args) => console.log('drain', ...args))
asd.addListener('end', (...args) => console.log('end', ...args))
asd.addListener('error', (...args) => console.log('error', ...args))
asd.addListener('lookup', (...args) => console.log('lookup', ...args))
asd.addListener('ready', (...args) => console.log('ready', ...args))
asd.addListener('timeout', (...args) => console.log('timeout', ...args))
setTimeout(() => {
asd.destroy();
// asd.drop()
// asd.resetAndDestroy();
// asd.unref()
// asd.emit('close', true)
// asd.emit('error', true)
// asd.emit('drain', true)
// asd.emit('close', true)
}, timeout)
} catch (error) {
console.log(error);
}
const finishedAt = Date.now();
console.log('finished fetch at: ', new Date(finishedAt))
console.log(`fetch took ${finishedAt - startedAt} ms`)
console.log('timestamp when script should finish: ', Date.now()) I made few experiments in undici code and found that everything fine there There is currently no way to finish tls socket that stuck in resolving domain. it simply ignores call of |
Then I would assume this is somehow expected; replicating this in local without network interruptions it usually results in extra
If you add network interruption, the initiator will send a first |
downgrading node version with nvm solved for me |
to which version? |
Hi guys, got same problem here, how to solve this? |
update to latest |
I think I'm experiencing the same issue. Running this single statement on Node v22.6.0 with zero dependencies installed throws await fetch('https://registry.npmjs.orgBROKEN/', { signal: AbortSignal.timeout(500) }) The fetch was intentionally aborted by the user (me) - therefore, the expected behaviour is for the node process to complete, not hang. |
I will have a look at it. |
@75lb I can not reporduce this. |
Odd.. Here is my reproduction code, literally a one-liner in an empty directory. I'm on mac OS 14.6.1 using iTerm2. More version details in this screen, in the |
As @Uzlopak, cannot reproduce it. Request is aborted and connection terminated after the |
Just out of curiosity, I re-tried my reproducation case - I was unable to reproduce the issue in Node v23.1, 22.11 and 22.7 (the version which failed, previously). I now see this immediate error (instead of the 30s timeout mentioned before):
Since first raising the issue in August, the only things that have changed are upgrading macOS and changing Wifi network. So, I'm still not clear what the root cause of my previous timeout issue was.. |
Have the same problem, in the nextjs13.5/nextjs14, i got
|
Version
v21.0.0
Platform
Linux archhost 6.5.7-arch1-1 nodejs/node#1 SMP PREEMPT_DYNAMIC Tue, 10 Oct 2023 21:10:21 +0000 x86_64 GNU/Linux
Subsystem
No response
What steps will reproduce the bug?
npm start
index.js
package.json
How often does it reproduce? Is there a required condition?
It reproduces every time with described network configuration. No other condition. Same behavior in node
v20.7.0
andv18.18.0
What is the expected behavior? Why is that the expected behavior?
Fetch waits 10 seconds, throws error
UND_ERR_CONNECT_TIMEOUT
. Then script exits immediately after it completed. Difference betweentimestamp when script should finish
andtimestamp when script actually finished
should be near zero.What do you see instead?
Fetch is being executed as expected. It kills itself after about 10 seconds and then returns control back to the script.
But the problem is that something in the background is still blocking node.js and does not allow process to exit. It just hangs waiting for something to be finished even if the script was completed.
After all code in the script been executed it takes 110 seconds (
1697848749482 - 1697848639391 = 110091
) to exit from process.But enabling wifi back (restoring internet connection) during process's hanging (after the moment fetch already failed and logged error) magically speeds up exit from process.
Additional information
Running completely offline
If we skip 2-3 steps and just run script offline we get other kind of error and expected behavior.
Following is being printed to console:
Difference between
timestamp when script should finish
andtimestamp when script actually finished
is 60 ms what is fine.Running online
If we skip 4 step and just run script online we get no errors and expected behavior.
Following is being printed to console:
Difference between
timestamp when script should finish
andtimestamp when script actually finished
is 167 ms what also seems fine.Running using node-fetch with described network configuration
If we install
node-fetch
withnpm i node-fetch
and then import it in the beginning usingimport fetch from 'node-fetch';
it will replace original fetch and will give us different behavior:When
fetch
fromnode-fetch
fails, it logs last few lines to console and script exits immediately. Difference betweentimestamp when script should finish
andtimestamp when script actually finished
is 93 ms what also seems fine.The interesting thing is that
fetch
fromnode-fetch
throws the same error as error thrown by original fetch when vpn disabled and notebook completely offline.fetch
fromnode-fetch
returns control after 120 seconds. Original fetch returns control after 10 seconds and then hangs in the background 110 seconds. Looks very suspicious (120 == 110 + 10
).The text was updated successfully, but these errors were encountered: