-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moving queue size and making node flume queue bigger #724
Conversation
After double checking, it seems that we're back to the problem that if Meaning that the current I think we should replace our current I have opened #726 to temporarely ignore this issue. I have double checked that this actually does not create memory leak as DropToken are well reported back to the origin nodes cleaning up the shared memory. |
This is a followup PR to #724, I have found out that the issue is that the reference counting of the dora node does not delay the cleanup and the node is still cleaned up before the pyarrow array creating a deadlock. This PR reduces the timeout time to start cleanup drop tokens to make sure that the process does not endlessly hangs creating a cascading effect of nodes not ending. In the future, I hope that we can create a **python based** reference counting between the python node and its generated event so that each event holds a reference to the nodes and get cleaned up before the node gets cleaned up
This PR fixes the issue that dora node are not fair between inputs when frequency of both inputs are different and the processing time between input is high. What end up happening is that one input might be overwhelmingly called as it's frequency is higher or lower depending on the queue_size. This PR fixes this issue by adding a scheduler that is always going to check that the next input is the one that has been waiting the longest within the queue making fairness between inputs. This PR is a follow up PR to #724 that is rewriting the queue within nodes instead of the daemon.
…ondition that is cleaning up node first and then python arrow reference.
…ctive on the receiver side and closing sender side shared memory with a lower timeout
4f3c6e4
to
73bd73c
Compare
This enables more comprehensive errors on when using python installs
Moving queue size to node fixes long latency issues for python node and make it possible to set the right queue_size.
I also changed flume queue length within nodes to makes this possible.
This seems to fix some of the graceful stop issue in Python but further investigation is required.