Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hung TCPConnection can interfere with Pony runtime shutdown #166

Open
slfritchie opened this issue Aug 16, 2018 · 1 comment
Open

Hung TCPConnection can interfere with Pony runtime shutdown #166

slfritchie opened this issue Aug 16, 2018 · 1 comment

Comments

@slfritchie
Copy link

While fiddling with Pony 0.24.4, I've discovered that a suspended/hung/machine crashed TCP peer can interfere with the Pony runtime's shutdown. "Interfere" means blocking the runtime's timely shutdown due to network sockets remaining partially open and thus causing the ASIO subsystem to continue being noisy.

General steps to reproduce:

  1. Run a Pony program that opens a TCP socket to a peer.
  2. Suspend the peer's OS process with Control-z/SIGSTOP
  3. The local Pony program attempts to shutdown by dispose()ing of the socket (and any others), stopping all remaining timers from firing, etc.
  4. The Pony program will not exit until the peer OS process is unsuspended (e.g., by SIGCONT), killed, or the peer's host crashes.

A demo program is at https://gist.github.com/slfritchie/558f44bcef5a29ad4ae9eaf208723bbc. Use as follows:

  1. Use a program like netcat to listen to TCP port 8888 on the localhost interface, e.g., nc -l 8888
  2. Compile and run hang-bug.pony on the same machine as netcat.
  3. In less than 5 seconds after running hang-bug, press Control-z to suspend the netcat process.
  4. The hang-bug program will not exit until the netcat process is resumed or killed.
  • The last message printed is Ticker, dispose socket

The hang-bug program will exit 5 seconds after starting if the netcat process's execution is not interfered with.

AFAICT, this delay is a feature of the runtime. TCP sockets are implemented by actors, and reads & writes & dispose() requests with sockets involve async messaging as any other Pony actor. In keeping with synchronous socket behavior of a quick sequence of several writes followed by a close by something written in C for a POSIX OS, if the TCP socket isn't closed prematurely, we expect all bytes written to be sent prior to the close. Any bytes not written due to flow control would be signalled by the return value of thewrite/writev/send/etc system calls.

Pony's async messaging doesn't give the sending actor direct feedback of the system call return status; the TCPConnection actor is responsible for buffering not-yet-sent data and managing yet-to-be-read bytes from the socket.

  1. If any data remains buffered by the TCPConnection actor in the _pending_writev array, the socket will not be closed, and the ASIO subsystem will remain noisy.
  2. TCPConnection needs to observe a read of 0 bytes from the socket to trigger its final closing logic. If the remote peer is suspended/hung/crashed, that event is delayed for an unknown period of time.
  3. In a related area of regular vs. hard socket close, TCPConnection will use the hard close path if dispose() is called and the actor is in muted state. However, if the actor is in throttled state, the hard close path is not taken. I think there's a good argument to make that a hard close is appropriate when in throttled state.

Possible remedies might include:

a. Adding a hard_close() behavior to give a "close the socket NOW" option to socket users.
b. Add an optional timer + per-socket configurable that starts when dispose() is called. If the timer fires, and the socket isn't yet fully closed, then the socket will go the hard close path.

@slfritchie
Copy link
Author

Notes from today dev sync meeting, which was quite small & so might not have input from Those Who Have An Opinion.

  1. Joe would like to see an RFC developed.
  2. Joe suggested splitting the buffered writes and pending reads into separate remedies.
  3. Perhaps not enough consensus yet on what to do about dispose() when the socket is throttled.

@SeanTAllen SeanTAllen transferred this issue from ponylang/ponyc May 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant