Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Child process breaks on command #55802

Open
prettydiff opened this issue Nov 9, 2024 · 7 comments
Open

Child process breaks on command #55802

prettydiff opened this issue Nov 9, 2024 · 7 comments
Labels
needs more info Issues without a valid reproduction.

Comments

@prettydiff
Copy link
Contributor

prettydiff commented Nov 9, 2024

Version

23.1.0

Platform

* fails on Debian 12 as a regular user with statically specified ports
* fails on Windows 10 as a regular user with statically specified ports
* does not fail Debian 12 via systemd (as an OS service run as root)
* does not fail if the ports are randomly assigned by specifying port 0 on server invocation

Subsystem

No response

What steps will reproduce the bug?

  1. execute a child process with command nmap --open 127.0.0.1 via instruction from any kind of network stream, such as WebSocket message or HTTP request.
const port_map = function (callback:() => void):void {
    const command:string = "nmap --open 127.0.0.1";
    node.child_process.exec(command, function (error:node_childProcess_ExecException, stdout:string):void {
        if (callback !== null) {
            callback();
        }
    });
};

export default port_map;

How often does it reproduce? Is there a required condition?

I can reproduce this 100% from command line in both Debian 12 and Windows 10. 0% from systemd.

What is the expected behavior? Why is that the expected behavior?

child_process exec calls a callback and child_process spawn calls event handlers in accordance with Node API definitions.

What do you see instead?

The application crashes fatally if the error event on the corresponding socket is not trapped immediately at first connection time before other instructions are executed. Assigning a handler to the socket's error event listener later is insufficient, for example assigning the handler from within the following code still results in crashes:

socket.once("data", handler);

The actual error reported by the socket is:

Error: read ECONNRESET
    at TCP.onStreamRead (node:internal/stream_base_commons:216:20) {
  errno: -104,
  code: 'ECONNRESET',
  syscall: 'read'
}

The error is the direct result of a child process execution. The corresponding child process executes in response to instructions transmitted on the corresponding socket, but is otherwise not related or associated to the socket.

This following error is the fatal error messaging if the error is not immediately trapped on the corresponding socket.

node:events:485
      throw er; // Unhandled 'error' event
      ^

Error: read ECONNRESET
    at TCP.onStreamRead (node:internal/stream_base_commons:216:20)
Emitted 'error' event on Socket instance at:
    at emitErrorNT (node:internal/streams/destroy:170:8)
    at emitErrorCloseNT (node:internal/streams/destroy:129:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21) {
  errno: -104,
  code: 'ECONNRESET',
  syscall: 'read'
}

The error messaging describes a socket failure, but it is not. It is a child process call only and only on the command specified.

Additional information

Some of the things I have tried:

  1. I have tried a few other trivial commands like echo hello and ps -a. They work without issue.
  2. In exec have tried specifying different shells /bin/sh and /bin/bash
  3. I have tried to call the child process with exec and spawn
  4. I have tried to call the child process in different ways: manually, recursively, setTimeout, from automated response to a socket message
  5. I validated that all sockets connected, both http and ws, have assigned event handlers for the error event and I have also isolated all sockets from the problematic behavior.
@RedYetiDev
Copy link
Member

I simplified your reproduction to:

import { exec } from 'node:child_process';

exec("nmap --open 127.0.0.1", console.log);

And I can't reproduce:

$ node index.js
null Starting Nmap 7.94SVN ( https://nmap.org ) at 2024-11-09 20:22 EST
Nmap done: 1 IP address (1 host up) scanned in 0.14 seconds

@juanarbol
Copy link
Member

juanarbol commented Nov 10, 2024

Seems like a problem w/ your environment, would you mind to change that IP into an exposed one? Like google's or github's IP?

I can't reproduce the issue as well.

Also ECONNRESET means: "connection reset by peer". It is not a Node.js problem, but a network problem.

Refs: https://docs.libuv.org/en/v1.x/errors.html#c.UV_ECONNRESET

@prettydiff
Copy link
Contributor Author

It may likely be a problem with my code base. I am continuing to investigate.

I can reproduce this 100% on both Debian 12 and Windows 10. I just discovered that the problem does not occur if the application is executed from systemd.

@prettydiff
Copy link
Contributor Author

prettydiff commented Nov 10, 2024

Closing issue. I am attaching an empty event handler to the error event of all sockets immediate after sockets connect and that does seem to trap the error.

This does seem to be a valid Node issue. I have updated the issue at the top with my more recent observations.

@aduh95
Copy link
Contributor

aduh95 commented Nov 10, 2024

Assigning a handler to the socket's error event listener later is insufficient, for example assigning the handler from within the following code still results in crashes:

socket.once("data", handler);

That seems to be a "data" handler, rather than an "error" handler, which probably explains the error you see.

@prettydiff
Copy link
Contributor Author

prettydiff commented Nov 10, 2024

@aduh95 I was likely not clear when I was speaking to timing.

const connection = function (tls_socket) {
    // where sockets are born, the connection event handler
    // for the sake of this thread socket errors must be trapped here before other functions are called, otherwise the application will break
    // when evers are trapped later, even if before the breaking action, it is too late
    tls_socket.on("error", function (error) {
        console.log(error);
    });

    const handler = function (data:Buffer) {
        // reason about the data and make determinations about protocol management and response messages
        // ...
        const socket = this;
        socket.on("error", function (error) {
            // errors on the socket are trapped, and those errors will not terminate the application
            // except in this case its too late, and the application still reports a socket error and breaks
            // Its interesting because the breaking action, a child process, does not execute until much later due to human interaction
        };
    };
    tls_socket.once("data", handler);
};

// servers listening for connections
server.on("connection", connection);  // net.server
node.tls.server.createServer({
    ca: "key hash",
    cert: "key hash",
    key: "key hash"
}, connection); // tls.server

Looking at the code above:

  1. Two servers are stood up and use the same connection handler
  2. The connection handler is where sockets are born. If errors are not trapped there it is too late.
  3. All further management upon the socket occurs in: tls_socket.once("data", handler);
  4. Trapping errors after reasoning about the socket is too late, even if the breaking condition has not yet occurred.

I cannot further reason about the nature of that timing. I am also unclear why executing a child process would have anything to do with a network socket's read stream. I can only speculate, and this is a completely uninformed guess, that there is a stream collision between the read stream of the socket and the pipes of the child process. If so its unclear why altering the timing of error trapping would have any effect or why executing the application from a higher privilege, run as root from systemd, would have any effect.

I will be happy to continue investigating, but I suspect I am approaching the limits of what evidence I can gather from user land.

@prettydiff
Copy link
Contributor Author

prettydiff commented Nov 10, 2024

Here is the application demonstrating the issue on branch "servers": https://github.com/prettydiff/webserver/tree/servers

  1. clone repo && cd
  2. execute npm install
  3. execute npm run build
  4. execute npm run server
  5. The first run will generate a default server for hosting the dashboard, but on random ports. CTRL+C out of the application, change the port assignment in the dynamically created servers.json file to any ports of your choice.
  6. execute the application again npm run server
  7. for some reason the problem occurs from any statically assigned port 100% of the time, but I just discovered it does not occur at all from a randomly assigned port. This means it is crashing on net socket error even when there are no sockets connected.

The error can be suppressed by uncommenting the following three lines and repeating the process above (except for npm install): https://github.com/prettydiff/webserver/blob/servers/lib/transmit/server.ts#L291-L293

The actual failure occurs from this line: https://github.com/prettydiff/webserver/blob/servers/lib/utilities/port_map.ts#L75

This logic guarantees the error will occur within 10 seconds of application start regardless of further action and no connections: https://github.com/prettydiff/webserver/blob/servers/lib/index.ts#L75-L79

@aduh95 aduh95 added the needs more info Issues without a valid reproduction. label Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs more info Issues without a valid reproduction.
Projects
None yet
Development

No branches or pull requests

4 participants