You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add support for the ElasticsearchSourceStage to call Elasticsearch to clear its scroll upon stream completion. As shown in the Elasticsearch scroll documentation for versions 5, 6, and 7:
Search context are automatically removed when the scroll timeout has been exceeded. However keeping scrolls open has a cost, as discussed in the previous section so scrolls should be explicitly cleared as soon as the scroll is not being used anymore using the clear-scroll API
This feature request amounts to asynchronous resource cleanup where the scroll is the resource and calling the delete scroll endpoint is the cleanup.
Details
The call to delete a scroll is simple: DELETE /_search/scroll/${scroll_id}.
I see only 3 events that cause the ElasticsearchSourceStage to terminate:
The scroll completes normally.
Downstream finishes.
The call to get the next set of scroll hits fails for one of two reasons:
a. The scroll no longer exists.
b. Some other exception.
In the case of 1, 2, and 3.b, the stage should attempt to clear the scroll. In the case of 3.a, the scroll is already cleared, so there is no need to delete it.
In second case, I'm not sure how the graph framework handles asynchronous cleanup. I know of the postStop lifecycle method, but based on some brief reading through the source code, that does not look like it is designed to handle asynchronous cleanup. Is there a common pattern for asynchronous resource clean up in the case of down stream finish? Even if the scroll has to live in that case, I would still want to delete the scroll for the other cases. Note that in the case of the scroll completing normally (1) or the Elasticsearch client returning an exception (3.b), it'd be easy enough to attempt the async cleanup before completing the stage.
Because the ElasticsearchSourceStage privately maintains the scroll_id, there is currently no good way for end users to clear the scroll. Further, when the stage completes, the scroll_id is gone, so there is no case where the scroll should persist beyond the stage's lifecycle.
If this new feature gains traction, I'm be happy to help implement it.
The text was updated successfully, but these errors were encountered:
Short description
Add support for the
ElasticsearchSourceStage
to call Elasticsearch to clear its scroll upon stream completion. As shown in the Elasticsearch scroll documentation for versions 5, 6, and 7:This feature request amounts to asynchronous resource cleanup where the scroll is the resource and calling the delete scroll endpoint is the cleanup.
Details
The call to delete a scroll is simple:
DELETE /_search/scroll/${scroll_id}
.I see only 3 events that cause the
ElasticsearchSourceStage
to terminate:a. The scroll no longer exists.
b. Some other exception.
In the case of 1, 2, and 3.b, the stage should attempt to clear the scroll. In the case of 3.a, the scroll is already cleared, so there is no need to delete it.
In second case, I'm not sure how the graph framework handles asynchronous cleanup. I know of the
postStop
lifecycle method, but based on some brief reading through the source code, that does not look like it is designed to handle asynchronous cleanup. Is there a common pattern for asynchronous resource clean up in the case of down stream finish? Even if the scroll has to live in that case, I would still want to delete the scroll for the other cases. Note that in the case of the scroll completing normally (1) or the Elasticsearch client returning an exception (3.b), it'd be easy enough to attempt the async cleanup before completing the stage.Because the
ElasticsearchSourceStage
privately maintains thescroll_id
, there is currently no good way for end users to clear the scroll. Further, when the stage completes, thescroll_id
is gone, so there is no case where the scroll should persist beyond the stage's lifecycle.If this new feature gains traction, I'm be happy to help implement it.
The text was updated successfully, but these errors were encountered: