Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block Parser: Implement generator / stream / chunked block parse interface #19021

Open
aduth opened this issue Dec 9, 2019 · 4 comments
Open
Labels
[Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Type] Enhancement A suggestion for improvement. [Type] Performance Related to performance efforts

Comments

@aduth
Copy link
Member

aduth commented Dec 9, 2019

Previously: #7970, #9403

The current implementation of the block parser will synchronously evaluate a content string, and return an array of parsed blocks. This behavior prevents us from being able to leverage some techniques to optimize an initial rendering of the editor. For example, there could be a perceived performance improvement if we were to parse the first ~10 blocks, then render the block editor, then continue to parse the remaining (or parse the remaining only when the user interacts to scroll the block list, using some technique of infinite scroll or virtualized list rendering). Those efforts should be tracked as separate issues, but are currently blocked by the parse behavior, aimed to be explored for enhancement in this issue.

See related concepts: "Time to First Paint" and "First Meaningful Paint"

Possible alternatives to this approach could include ideas around:

  • Manually segmenting a string of blocks content into smaller chunks, before passing to parse. There's some redundancy here in what would need to be implemented as a "partial parse", and likely to be difficult to achieve with nested blocks.
  • Asynchronous callbacks for parse, to avoid any blocking, enable parsing to occur over the network or in separate worker threads

Additional challenges:

  • What does incremental parsing look like in the context of nested blocks?

Proposal: Introduce a new interface to the blocks parser which would allow blocks to be parsed in chunks. This could be a new block parser package, or an addition to the existing block parser which could then be used in the @wordpress/blocks parse implementation.

Today:

const blocks = parse( contentString );

Option A (Generator): (* Personal recommended approach)

const blocks = Array.from( generateParse( contentString ) );

Pros:

  • Aligns well to the idea of an iterable set of parsed blocks data
  • Allows very granular control over how iteration proceeds
  • Concept native to the JavaScript language (unlike streams as mostly Node-specific)

Cons:

  • Perception of generators as being difficult to work with

Option B (Streams):

const blocks = Array.from( await streamParse( contentString ) );
// Relies on experimental async iteration of streams: https://2ality.com/2019/11/nodejs-streams-async-iteration.html

Pros:

  • Aligns well to the idea of a streaming flow of parsed blocks data

Cons:

Option C (Chunked Results):

This could go a few different ways:

  • Treating it as some poppable stack
  • A generator-like abstraction of { done: boolean, next: function }
  • An "instance" of a parser which produces X results at a time
const blocks = [];
const parser = createParser( contentString );
blocks.push( ...parser.take( 10 ) );
blocks.push( ...parser.take( 10 ) );

Pros:

  • More familiar language syntax (i.e. not a generator or a stream 😄 )
  • If it could be done, use the same parse function, and add a new second argument to specify the effective "per-page" of parsed blocks

Cons:

  • Custom implementation is redundant with existing language constructs
  • Unclear that it can be done to reuse existing parse function

cc @dmsnell

@aduth aduth added [Type] Enhancement A suggestion for improvement. [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Type] Performance Related to performance efforts labels Dec 9, 2019
@mcsf
Copy link
Contributor

mcsf commented Dec 10, 2019

Thanks for the comprehensive proposal. I'd also lean towards generators, for the reasons you cite and for the fact that it is a construct which, albeit perhaps unfamiliar in WordPress development, is here to stay and is worth learning.

@dmsnell
Copy link
Member

dmsnell commented Dec 10, 2019

I've played around with the idea of parse-level filtering where you could filter each block as it pushes onto the parse output. We could accomplish these different proposals using just such a method.

const myParse = doc => {
  parse( doc, {
    onParseBlock: block => {
      yield block;
    }
  } );
};
const myParse = doc => {
  const stream = new StreamOrWhatever();
  parse( doc, {
    onParseBlock: block => {
      stream.emit( 'block', block );
    }
  } );
  return stream;
};
const myParse = doc => Promise.resolve( parse( doc ) );

There's two levels of parsing of course: top-level blocks and inner blocks. The only way I can imagine using a stream/generator approach is to only send top-level blocks since we'd want to keep things in order, otherwise we'd need to have the unique IDs for each inner block communicated inside the parser.

@aduth one of the things I considered earlier when working with asynchronous parsing is the possibility of external updates. I think you pointed out that with a stream/message interface we could potentially handle that gracefully using a different message for updated blocks. Have you considered this scenario recently? For example, were we to store blocks in Simperium we could get external updates, same with WebRTC-connected collaborative sessions.

@aduth
Copy link
Member Author

aduth commented Dec 11, 2019

I think @epiqueras or @ellatrix might have more insight about if or how this could relate to something like collaborative editing. I can see some similarities in how, with collaborative editing, we have need to send messages describing an update to a block in a way which would be treated less as a "change resulting from user action" and more a snapshot of the "latest truth of the block value". In how that relates to parsing, such a system could allow us to iteratively build the editor's blocks state in a way which uses these signals, rather than leaning so heavily on the parser provide a specialized interface.

Edit: To clarify a specific point here:

less as a "change resulting from user action"

By this, I don't mean to imply that the messages couldn't be described as the independent changesets of specific block attributes. To me, I'm considering it more in the context of the session's undo/redo history, where currently we would encounter challenges if we tried to start adding blocks iteratively to an existing block editor state.

@epiqueras
Copy link
Contributor

The collaborative editing bindings would sync in all available blocks from peers.

This means that if a peer has scrolled further down and parsed more blocks, the parser should not append blocks that are already there. We would also need to modify the bindings to keep tombstones of deleted blocks to make sure the parser doesn't reintroduce them.

All of this will be very complex. I'm not sure if it's worth the effort for our use cases.

Alternatively, we have discussed exposing the block tree JSON data in the REST API. If we do that, we won't have the parsing lag anywhere, and peers can start syncing immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Type] Enhancement A suggestion for improvement. [Type] Performance Related to performance efforts
Projects
None yet
Development

No branches or pull requests

4 participants