Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"on disconnect" event unreliable #133

Open
blitzcode opened this issue May 6, 2016 · 13 comments
Open

"on disconnect" event unreliable #133

blitzcode opened this issue May 6, 2016 · 13 comments

Comments

@blitzcode
Copy link
Contributor

If you simply spam the refresh button during loading of the page, or any other exception / error occurs, the disconnect event will never trigger. This means i.e. that programs which spawn a worker thread and kill it during the disconnect event (like the Chat example) will leak resources. It also means it's not possible for a program to reliably detect if no clients are connected and cease certain time consuming operations.

@blitzcode
Copy link
Contributor Author

Looking at this some more, I think in addition to the O(1) space leak for an orphaned message processing thread there's also a potential O(n) one if a TChan based architecture is used (like in the Chat example). The messages can't be garbage collected if one or more orphaned threads can't read them anymore, leaving them to pile up indefinitely.

HeinrichApfelmus added a commit that referenced this issue Dec 7, 2016
Also ensure that `runFunction` and `callFunction`
throw an exception when the connection is broken.
@HeinrichApfelmus
Copy link
Owner

I have addressed the problems with "on disconnect" in the latest commit referencing this issue. It should work much more reliably now. Does this help, or is there still an edge case that is not covered?

@blitzcode
Copy link
Contributor Author

I was unfortunately still able to disconnect without my on disconnect code running. I also saw this debug output:

Foreign.JavaScript: Browser window disconnected.
thread blocked indefinitely in an MVar operation
hue-dashboard: ConnectionClosed

I'm not sure which thread or MVar operation is the culprit, but the only place in my code where I use MVars is for tracing, and I don't see how there could be a block there.

For testing I created a TVar which I increment on setup and decrement on disconnect. The only other operation in the disconnect handler is the cancellation of the worker thread, like your chat example. I managed to get a count of 2 with no clients connected anymore.

Apart from any potential implementation bugs, the whole concept of cleanup in a disconnect handler seems problematic. What I mean is, I don't see how this could ever be completely reliable. Normally you'd use something like bracket or withAsync to make absolutely sure a background worker thread is terminated or that a 'connected user' counter is always correctly maintained. But you can't use bracket as the setup function has to return. For instance, I can increment the user count and immediately register the decrementing 'on disconnect' handler (or swap the order of these actions), but there always could be an async exception interrupting me. So I think there's always the chance of leaving dangling resources or wrong counts. Not sure what the correct solution is here.

@blitzcode
Copy link
Contributor Author

btw, those three quoted messages happened when I brought up the threepenny application on my phone, switched away from the browser and waited for the connection to be closed. Simply spamming refresh in the browser seemed to work fine, probably a more graceful disconnect.

@HeinrichApfelmus
Copy link
Owner

I was unfortunately still able to disconnect without my on disconnect code running. I also saw this debug output:

    Foreign.JavaScript: Browser window disconnected.
    thread blocked indefinitely in an MVar operation
    hue-dashboard: ConnectionClosed

Huh, that is quite weird, because the code block that prints the "Browser window disconnected" message (line 144) will also execute the disconnect event handler (line 148-149). The only thing that could happen is that the intermediate call to commClose raises an exception. Hm, it is implemented in terms of killThread, which is interruptible, so that may be related to the "thread blocked indefinitely in an MVar operation" message...

Just to be sure: Does the problem disappear if you replace line 146 (the line containing commClose) with the following code?

let all :: E.SomeException -> Maybe ()
    all _ = Just ()
E.tryJust all $ commClose comm

Apart from any potential implementation bugs, the whole concept of cleanup in a disconnect handler seems problematic. What I mean is, I don't see how this could ever be completely reliable. Normally you'd use something like bracket or withAsync. [...] But you can't use bracket as the setup function has to return.

That is true, but I think I can do two things:

  • Instead of registering the disconnect handler within the setup function, I can make it a parameter to the startGUI function
  • And then wrap the event loop inside a bracket with the disconnect handler as last parameter.

In other words, I could turn the startGUI function into a variant of bracket. It may be a bit inconvenient, but probably better than less reliable?

@blitzcode
Copy link
Contributor Author

It's a bit puzzling, I agree. I checked my code again, the only place where I directly use MVars is through withMVar, which should be safe and never leave an empty one that would then end up blocking a thread.

I tried with your catching fix and could not reproduce the issue. To be sure, I then tried again without and could not get the timing right to reproduce it either. The timing must be quite specific.

The problem I see with specifying the cleanup function up-front is per-connection resources. For instance, how would the chat example terminate its worker thread that is spawned inside the setup function?

HeinrichApfelmus added a commit that referenced this issue Dec 14, 2016
…133

This makes registering the disconnect handler much
more robust. On the other hand, it takes away the
possibility to interrupt a long running computation
by closing the browser window.

Also use the `commOpen` TVar to implement the `commClose`
function, instead of using `throwTo`, because
it is not clear what happens when we do that twice.
@HeinrichApfelmus
Copy link
Owner

Apart from any potential implementation bugs, the whole concept of cleanup in a disconnect handler seems problematic. What I mean is, I don't see how this could ever be completely reliable. Normally you'd use something like bracket or withAsync.

Actually, this is not necessary if we implement the disconnect event as an ordinary event as opposed to an asynchronous exception. Commit 121eb17 implements this. On the other hand, this makes it impossible to stop a long running computation by closing the browser window... Thoughts?

@blitzcode
Copy link
Contributor Author

It's probably reasonable to expect developer to not do long running computations or lengthy I/O in response to a UI event. It would be nice in general if you could completely kill a wedged session by refreshing the browser, though.

Your change would certainly stop the setup code from being interrupted by a disconnect. A remaining issue would still be how to structure the connection setup code. The sequence of steps might be something along the lines of this:

  • Increment connected user count
  • read from config file
  • spawn update worker thread
  • make REST API call
  • register disconnect handler cleaning up all of the above

Even if threepenny would never throw an async exception, any of the above might cause an exception to be thrown. So I wonder if it's possible to make this completely bulletproof. Normally, you'd use the bracket pattern for something like spawning a thread or incrementing/decrementing the connection count, but how would that look like if the cleanup has to happen inside an event handler? I think one could use something like bracketOnError with each step to ensure cleanup in case an exception is raised before the final handler is registered.

@blitzcode
Copy link
Contributor Author

I suspect the latest commit for this might have broken something (everything? ;-)), see #172

@HeinrichApfelmus
Copy link
Owner

HeinrichApfelmus commented Apr 24, 2017

I intend that the current semantics (synchronous event, commit 121eb17) go into the next Hackage release (0.8). Does it work for you?

@blitzcode
Copy link
Contributor Author

I've been running with the current code for a while and haven't noticed any actual issues in 'real world' usage. I'm still somewhat concerned that this might cause resource leaks for long-running applications just in case there actually is some hanging computation. Also I don't think it's possible to make cleanup completely bulletproof, see my comment from Dec 14th.

@HeinrichApfelmus
Copy link
Owner

Ok, thanks for the heads-up! I think the current version qualifies as "good enough", so I'll release it for now, but keep the issue open. 😄

@blitzcode
Copy link
Contributor Author

I'd say so, certainly doesn't seem to be a regression. My type of application is a bit more impacted by potential resource leaks as it'll have many client connections and often runs for many months without restarts, but I hope that I actually don't have any hanging event handlers ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants