You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.
Note: I looked into the stream manager running on a local cluster, but as far as I can tell other parts of Heron and other kinds of clusters may also be affected.
When stopping a topology (heron kill) on a local cluster, the local scheduler sends a SIGTERM signal to all heron executors. The executor handles the signal and in turn sends a SIGTERM to all heron components running in its container.
The stream manager however does not register a handler to handle this SIGTERM, which means it will be killed without calling its destructors or any cleanup code. This seems unintuitive and can lead to unexpected behavior .
In my case, I tried to use the stream managers destructor to export additional latency metrics I collected and noticed that the destructor is never called.
I suggest handling SIGTERM signals in the stream manager and exit gracefully. The eventlib used in the event loop can handle the signal and there is already a function in place that stops the event loop (EventLoopImpl::loopExit()), so the fix is quite simple and as far as I can tell should not have any side effects.
Note: I looked into the stream manager running on a local cluster, but as far as I can tell other parts of Heron and other kinds of clusters may also be affected.
When stopping a topology (
heron kill
) on a local cluster, the local scheduler sends a SIGTERM signal to all heron executors. The executor handles the signal and in turn sends a SIGTERM to all heron components running in its container.The stream manager however does not register a handler to handle this SIGTERM, which means it will be killed without calling its destructors or any cleanup code. This seems unintuitive and can lead to unexpected behavior .
In my case, I tried to use the stream managers destructor to export additional latency metrics I collected and noticed that the destructor is never called.
I suggest handling SIGTERM signals in the stream manager and exit gracefully. The eventlib used in the event loop can handle the signal and there is already a function in place that stops the event loop (
EventLoopImpl::loopExit()
), so the fix is quite simple and as far as I can tell should not have any side effects.@antiguru
The text was updated successfully, but these errors were encountered: