Get an iterator of results #243

charles-paperman · 2023-09-04T16:51:38Z

Would be nice to have an iterator of results so that we can post filter and/or deal with each match with a potentially slow external code easily without load in RAM all matches simultaneously.

Example of use case: a json document with a very large list of object, we filter them with jsonpath and obtained a sublist of them that have to be inserted in a DB. Loading then in RAM is not possible (potentially too big). So we want to do a slow operation with each of them and free them from memory after that.

github-actions · 2023-09-04T16:52:01Z

Tagging @V0ldek for notifications

V0ldek · 2023-09-06T10:50:29Z

This is a really big feature.

The current engine does not support pausing/resuming. It also doesn't play well with the current architecture of Engine-Recorder-Sink &ndash the Recorder would have to pause the engine? There's eight different places where a match might be reported in the current main engine, and more if we count head skipping. All of those would have to be augmented to save the state of the engine and return to the caller. To add to the pain, in the general case the NodesRecorder performs reordering of results and so it doesn't report the matches immediatelly – it batches them on the stack and then can report many of them at the same time.

Not saying this is impossible, but it would almost certainly be an entirely new engine. In particular I suspect that simply adding this capability to the main engine would screw with SIMD code generation, even if the caller intended to consume the entire iterator immediately anyway.

If the concern here is memory consumption then there is a workaround with multithreading. You can spin up a thread for the engine and then another one as the consumer, and as the sink pass a wrapper around a bounded capacity queue/channel (e.g. crossbeam's ArrayQueue. That way you limit the RAM usage and the consumer can expose an iterator API which internally reads from the queue/channel.

I am going to file this into the "Future" category. We could explore adding multithreaded Sink support (maybe even an async one) earlier (the 1.1.0 target) – if that sounds appealing please let me know. I feel like a Sink impl for a channel would be useful, but it depends if multithreading is even an acceptable solution for the user.

charles-paperman · 2023-09-07T19:11:28Z

I think iterating through high level event would also allow a SAX-api style of interface. Would be really nice for application that needs the underlying classifiers but not the query compilation. I suspect this is hard to do but it would allow to build efficient validation as in simdjson.

The automata construction could then be build on the top of that API. Adding iterators would then simply changing this interface and the reporting results stuff.

charles-paperman added the type: feature New feature or request label Sep 4, 2023

github-actions bot added the acceptance: triage Waiting for owner's input label Sep 4, 2023

V0ldek added this to Active rsonpath development Sep 4, 2023

github-project-automation bot moved this to Todo in Active rsonpath development Sep 4, 2023

V0ldek added this to the Future milestone Sep 6, 2023

github-actions bot added acceptance: go ahead Reviewed, implementation can start and removed acceptance: triage Waiting for owner's input labels Sep 6, 2023

V0ldek added acceptance: needs design Sounds good, but needs exploration and prototyping help wanted External contributions welcome mod: engine area: result Improvements in query result reporting and removed acceptance: go ahead Reviewed, implementation can start labels Sep 6, 2023

charles-paperman closed this as completed Sep 7, 2023

github-project-automation bot moved this from Todo to Merged in Active rsonpath development Sep 7, 2023

charles-paperman reopened this Sep 7, 2023

github-project-automation bot moved this from Merged to Committed in Active rsonpath development Sep 7, 2023

github-actions bot added the acceptance: triage Waiting for owner's input label Sep 7, 2023

V0ldek removed the acceptance: triage Waiting for owner's input label Sep 7, 2023

V0ldek mentioned this issue Sep 18, 2023

Support a channel as a Sink #270

Open

V0ldek removed the mod: engine label Oct 4, 2023

V0ldek moved this from Committed to Todo in Active rsonpath development Oct 30, 2023

V0ldek added this to Active rq development Apr 4, 2024

github-project-automation bot moved this to Todo in Active rq development Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get an iterator of results #243

Get an iterator of results #243

charles-paperman commented Sep 4, 2023

github-actions bot commented Sep 4, 2023

V0ldek commented Sep 6, 2023

charles-paperman commented Sep 7, 2023

Get an iterator of results #243

Get an iterator of results #243

Comments

charles-paperman commented Sep 4, 2023

github-actions bot commented Sep 4, 2023

V0ldek commented Sep 6, 2023

charles-paperman commented Sep 7, 2023