Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Elasticsearch] Make ElasticsearchSourceStage Delete Its Scroll On Stage Completion #2314

Closed
michaeljmarshall opened this issue May 14, 2020 · 1 comment
Milestone

Comments

@michaeljmarshall
Copy link
Contributor

Short description

Add support for the ElasticsearchSourceStage to call Elasticsearch to clear its scroll upon stream completion. As shown in the Elasticsearch scroll documentation for versions 5, 6, and 7:

Search context are automatically removed when the scroll timeout has been exceeded. However keeping scrolls open has a cost, as discussed in the previous section so scrolls should be explicitly cleared as soon as the scroll is not being used anymore using the clear-scroll API

This feature request amounts to asynchronous resource cleanup where the scroll is the resource and calling the delete scroll endpoint is the cleanup.

Details

The call to delete a scroll is simple: DELETE /_search/scroll/${scroll_id}.

I see only 3 events that cause the ElasticsearchSourceStage to terminate:

  1. The scroll completes normally.
  2. Downstream finishes.
  3. The call to get the next set of scroll hits fails for one of two reasons:
    a. The scroll no longer exists.
    b. Some other exception.

In the case of 1, 2, and 3.b, the stage should attempt to clear the scroll. In the case of 3.a, the scroll is already cleared, so there is no need to delete it.

In second case, I'm not sure how the graph framework handles asynchronous cleanup. I know of the postStop lifecycle method, but based on some brief reading through the source code, that does not look like it is designed to handle asynchronous cleanup. Is there a common pattern for asynchronous resource clean up in the case of down stream finish? Even if the scroll has to live in that case, I would still want to delete the scroll for the other cases. Note that in the case of the scroll completing normally (1) or the Elasticsearch client returning an exception (3.b), it'd be easy enough to attempt the async cleanup before completing the stage.

Because the ElasticsearchSourceStage privately maintains the scroll_id, there is currently no good way for end users to clear the scroll. Further, when the stage completes, the scroll_id is gone, so there is no case where the scroll should persist beyond the stage's lifecycle.

If this new feature gains traction, I'm be happy to help implement it.

@ennru
Copy link
Member

ennru commented Jun 1, 2020

Fixed with #2330.

@ennru ennru closed this as completed Jun 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants