Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Stream manager not stopped gracefully #2947

Open
glrf opened this issue Jul 3, 2018 · 2 comments
Open

Stream manager not stopped gracefully #2947

glrf opened this issue Jul 3, 2018 · 2 comments

Comments

@glrf
Copy link

glrf commented Jul 3, 2018

Note: I looked into the stream manager running on a local cluster, but as far as I can tell other parts of Heron and other kinds of clusters may also be affected.

When stopping a topology (heron kill) on a local cluster, the local scheduler sends a SIGTERM signal to all heron executors. The executor handles the signal and in turn sends a SIGTERM to all heron components running in its container.

The stream manager however does not register a handler to handle this SIGTERM, which means it will be killed without calling its destructors or any cleanup code. This seems unintuitive and can lead to unexpected behavior .

In my case, I tried to use the stream managers destructor to export additional latency metrics I collected and noticed that the destructor is never called.

I suggest handling SIGTERM signals in the stream manager and exit gracefully. The eventlib used in the event loop can handle the signal and there is already a function in place that stops the event loop (EventLoopImpl::loopExit()), so the fix is quite simple and as far as I can tell should not have any side effects.

@antiguru

@kramasamy
Copy link
Contributor

@Glorfischi - it will be great if you could send a quick PR.

@antiguru
Copy link

antiguru commented Jul 5, 2018

See #2950 for the corresponding pull request.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants