-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSL EOF leads to persistent non-delivery of messages #750
Comments
(I did write a watchdog, but calling close and then reopen did not result in messages flowing.) Reading |
Are there any updates on this issue? I'm seeing it as well when the internet connection gets lost temporarily. |
It seems you are using loop_start, and some exception aren't correctly handled which cause the loop thread to crash. Once crashed no more processing it done. I've made a PR (#797) which should fix your problem. |
Thanks for addressing this. I am in fact using |
I gave this patch a try and first tests were positive. Thanks! |
I am using 1.6.1, NetBSD 9 amd64, python 3.10. The broker is up-to-date mosquitto, on a remote NetBSD 9 amd64 system. I am using 8883 with a real certificate, and generally everything works well.
I have a script which polls a UPS and publishes messages usually once a minute, but at intervals of up to 1s if something interesting has happened. On that system, I took the WAN interface down, releasing the lease, removing addresses and default route, and
ifconfig down
, waited an hour and brought it back. Obviously I won't have data in the meantime but I expect it to recover.In the log (stdout/stderr) of the program I see (with the json shortened; there are actually 14 keys):
So the "EOF" looks like write failed. Probably this is "no route to host". However I don't get a disconnect callback. And I did get a publish callback! But surely the broker didn't get the message.
Then, future publish calls happen, with no exceptions. But no publish callbacks.
Obviously I can write a watchdog to close/open if I don't get pubacks to python.
But it seems that the ssl error should be caught and the connection should be judged non-functional. Or at least some later publish should cause failure and to close the connection. The user program should not get an exception thrown other than in response to an API call and then only documented exceptions.
I don't think this is related to threading/locking, as the program only did one publish call, and without the network being down, runs for months.
The text was updated successfully, but these errors were encountered: