Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reader algorithms on chunk_reader #52

Open
vinipsmaker opened this issue Jul 8, 2021 · 4 comments
Open

Reader algorithms on chunk_reader #52

vinipsmaker opened this issue Jul 8, 2021 · 4 comments

Comments

@vinipsmaker
Copy link
Contributor

So, I'm trying to update tabjson to support the concatenated JSON streaming model. Therefore the first change I had to do was replacing reader by chunk_reader. However as soon as I made the change the project failed to compile due to algorithms such as json::partial::skip() no longer working.

These algorithms hardcode the type json::reader. However I don't think the current decision is wrong. It's pretty much unclear how to recover once a composite operation fails midway. Failing to accept chunk_reader args is a good thing here. If we were to change the algorithms I envision two approaches:

  • Extend the algorithms to embed the reading logic through user-injection points (e.g. a closure called to read more once end-of-buffer is reached).
  • Use a coroutine so the algorithm can resume once the user fills more of the buffer.

Both approaches are undesirable in my eyes.

However the question remains: how would one approach the usage of algorithms such as partial::skip() in a chunked stream? I have a few ideas and I thought I'd open up an issue to discuss them. I'll comment more later (a few of the ideas will also be discussed in separate issues).

@vinipsmaker
Copy link
Contributor Author

So, my idea to use algorithms in the likes of the ones we've developed for the partial namespace is parsers combinators. When I reach a token for which I want to partial::skip() over, I'll just instantiate a new json::chunk_reader to consume the current element. Once I'm over (parser for inner sequence emits token::end), I'll call chunk_reader.next<token::null>() on the outer parser. This idea might help to give you new scenarios to think on while you decide on what to do about #53.

@breese
Copy link
Owner

breese commented Jul 12, 2021

Getting json::partial::skip() to support chunk_reader can be done by moving the implementation of skip() into a detail namespace that takes a generic typename Reader argument instead of basic_reader<CharT>, and then add two public skip() overloads for chunk_reader.

That only partially solves the problem though, because we need to distinguish between a error and end-of-input in order to indicate that the chunk_reader needs more data (e.g. by return an empty view from skip().) However, chunk_reader restores its internal state, so we cannot query the reader afterwards.

A possible workaround could be to return a tri-bool instead of a bool, which indicates

  • true = token available
  • false = error
  • inderterminate = end of input

@vinipsmaker
Copy link
Contributor Author

vinipsmaker commented Jul 12, 2021

we need to distinguish between a error and end-of-input in order to indicate that the chunk_reader needs more data (e.g. by return an empty view from skip().)

That kinda was my point about coroutines. Once skip() returns, a new call to skip() won't know how much more tail needs to be consumed. I don't like the coroutine approaches tho (C++ coroutines are the worse implementation of coroutines I've ever seen).

My idea was to keep the state in a new chunk_reader object. You instantiate a new chunk_reader to consume the subtree defined by the current token. That way you know you must consume (the inner) chunk_reader until the end token. So this inner chunk_reader accidentally already keeps all the state you need about the substream. Once the subtree is parsed, you just need to advance the state for the outer chunk_reader (we could use next("null") but I'd like to see a next<token::null>() instead).

Honestly it's pretty easy to reimplement skip() and that's what I've done in my working copy of tabjson to implement chunked parsing. However there are more elaborate algorithms such as partial::parse() that I prefer to reuse instead of reimplement (the "extra" typing to reuse a ready abstraction pays the price here). I don't plan to touch partial::parse() anytime soon tho as tabjson serves all my needs so far w/o touching any DOM handle.

A possible workaround could be to return a tri-bool instead of a bool, which indicates [...]

That's the first time tribool ever solved a problem for me. It works for me.

The current lack of error vs end of input is also the only blocker for tabjson's chunked parsing as I can workaround everything else already.

@breese
Copy link
Owner

breese commented Jul 24, 2021

I have added a feature branch with the tribool changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants