Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editable: separate out content ops and switch to using tinymce tree output #3713

Closed
wants to merge 4 commits into from

Conversation

spocke
Copy link

@spocke spocke commented Nov 29, 2017

Description

Creating this PR more as discussion point. Can make a separate PR later depending on what comes out of this one.

The benefits of using the filtering in tiny is:

  • Removes invalid HTML elements it follows the HTML5 specification by default.
  • Removes internal elements such as placeholders, UI elements etc, zwnbsp:s.
  • Removes XSS elements and attributes such as scripts or event handlers.
  • Removes or pads invisible elements for example empty spans.
  • Removes or pads empty block elements such as an empty paragraph or h1.
  • Converts legacy elements like font and u to span elements.
  • Normalizes urls based on config absolute/relative etc.

Did some performance comparisons and the change is barely noticeable and since the blocks in gutenberg are mostly smaller chunks I don't think it will be a problem.

The performance is very content and browser dependent some browsers are slower at traversing the dom than parsing the html. IE/Edge/Firefox has a slower dom Chrome has a faster dom for example.

We have done some major optimizations in the tinymce 4.7.3 core and added the ability to get the intermediate node tree out this bypasses a few steps in the filtering process.

How Has This Been Tested?

I ported over the tests for domreact but if we expose the attributes and style to json in that dom-react package we don't need to move those over.
Other than that I did some manual testing of pressing enter in various blocks and saving contents. This is where the content operations occur.

Screenshots (jpeg or gifs if applicable):

Types of changes

  • Separated out content operations from editable.js to a separate file content.js.
  • Normalized the output of split operations to a single data structure.
  • Moved domreact into gutenberg. This might not be needed.
  • Swapped out the gutenberg specific filtering of nodes with tinymce filtering.

Checklist:

  • [Y] My code is tested.
  • [Y] My code follows the WordPress code style.
  • [N] My code follows has proper inline documentation.

@codecov
Copy link

codecov bot commented Nov 29, 2017

Codecov Report

Merging #3713 into master will decrease coverage by 0.17%.
The diff coverage is 35.57%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3713      +/-   ##
==========================================
- Coverage   37.45%   37.28%   -0.18%     
==========================================
  Files         277      281       +4     
  Lines        6707     6899     +192     
  Branches     1222     1279      +57     
==========================================
+ Hits         2512     2572      +60     
- Misses       3536     3626      +90     
- Partials      659      701      +42
Impacted Files Coverage Δ
blocks/editable/index.js 10.63% <0%> (+0.46%) ⬆️
blocks/editable/content.js 0% <0%> (ø)
utils/domreact/attrs.js 100% <100%> (ø)
blocks/api/matchers.js 92.85% <100%> (ø) ⬆️
blocks/editable/tree.js 9.09% <9.09%> (ø)
utils/domreact/index.js 97.43% <97.43%> (ø)
blocks/api/raw-handling/shortcode-converter.js 46.15% <0%> (-6.23%) ⬇️
blocks/library/shortcode/index.js 30% <0%> (-3.34%) ⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cac31b0...2010ef1. Read the comment docs.

@spocke spocke added the [Feature] Rich Text Related to the Rich Text component that allows developers to render a contenteditable label Nov 29, 2017
Copy link
Contributor

@youknowriad youknowriad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think we should use the tree format. Interested in learning more about TinyMCE's tree format and whether it could be used as our "Editable"'s value format (even outside TinyMCE).

}

export function getContent( editor ) {
return childrenToReact( editor.getContent( { format: 'tree' } ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What shape has this "tree" format? Can you share an example?

Do you think it makes sense to avoid the "react" element format at all and just use the tree format in the state as well (outside Editable), in which case, it requires some refactoring but might be worth it (same as #3048)

Copy link
Author

@spocke spocke Dec 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a simple gist doc on the structure of this tree.
https://gist.github.com/spocke/b009c5ef460c188c8995eb46b064a6b7

I guess using it for tinymce related things makes sense but unsure if it's a general enough format for usage for other things since each node has methods on it for mutation. The react format seems simpler in that regard at the least it's input format.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you serialize this "tree" using JSON.stringify and recreate later after calling JSON.parse? The root cause why we don't want to store React objects in the Redux state is that rehydration doesn't work out of the box.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could convert the tree into the simple json format that was suggested else where. It's pretty fast to just convert a one tree structure to another if it's just simple objects. We just need to implement a way to set the tree back in tiny just before the filters apply.

}

content = renderToString( content );
editor.setContent( content, { format: 'raw' } );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the "tree" format to set content as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we only support getting the contents out as the tree but since it's an internal thing we just need to expose I see no reason why we couldn't support setting contents as a tree as well that would by pass the parsing step in the setContent call so it would be more efficient obviously.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be awesome, if we could send the same tree back 💯

@aduth
Copy link
Member

aduth commented Dec 4, 2017

My general feelings on this are:

  • Per RichText State Structure #771, the structure of how we represent rich content should ideally not be dependent on any specific library, neither TinyMCE nor React (i.e. not TinyMCE tree nodes, not React elements)
  • We should avoid recreating the TinyMCE clean-up behaviors if we can instead receive this data directly from the source (i.e. getting content from TinyMCE APIs as tree)
  • We should be conscious of and limiting how often we're converting between formats (i.e. parsing blocks -> WP tree shape -> TinyMCE HTML -> TinyMCE tree -> WP tree shape -> serialize block)

And a question:

  • Does this benefit TinyMCE as a project in how it internally represents data or is this just another serialization (getContent) target? Thinking either in terms of: Would TinyMCE maintain a tree shape as the contenteditable changes over time? Or in other serialization targets (getContent( { format: 'html' } ) use the tree shape as the starting point?

Overall I'm feeling positive about a change like this, particularly because it feels wrong for us to be cleaning up TinyMCE markup ourselves in Editable's createTinyMCEElement.

@spocke
Copy link
Author

spocke commented Dec 4, 2017

You could have a wordpress specific lightweight tree format that just does what you need and then convert dom or tinymce trees to that. I guess it depends on how you want to store contents having it in js only model makes it super fast to work with compared to the real dom.

But you would have to recreate a bunch of mutation logic if you want to alter the tree in memory that is something we already provide in the tinymce tree model but unsure if you need to filter the contents on the gutenberg side?

TinyMCE just exposes the tree that we already have bypassing the final step of taking the tree and serializing it back to html. If we where to support setting a tree as well we would just bypass the parser step and go straight to the filtering steps. So we kind of just exposed the internals though this tree format.

@Afraithe
Copy link

Afraithe commented Dec 5, 2017

I guess this boils down to a more broad discussion on how you want to store the format for Gutenberg specifically. As you say, converting between different formats should be avoided. I believe @iseulde came to a similar conclusion when this was last discussed, if you plan to use a string in/out then this doesn't make much sense.

So, what is the desired spec for the internal storage format?

@gziolo
Copy link
Member

gziolo commented Dec 14, 2017

So, what is the desired spec for the internal storage format?

@Afraithe, we did some explorations in #3048. Let me share one of the examples from the unit test. It mirrors how React.createElement(component, props, ...children) works. Every DOM node is represented as one of the following:

  • node text (string)
  • element node (array) - array contains node name, object with attributes and an unlimited number of children nodes

Assuming you have the following HTML code:

<div>This is a <strong>paragraph</strong> with a <a href="https://w.org/">link with <b>bold <i>and italics</i></b></a>.</div>

Here is the internal representation of the tree created:

[
	'This is a ',
	[ 'strong', {}, 'paragraph' ],
	' with a ',
	[ 'a', { href: 'https://w.org/' }, 'link with ', [
		'b', {}, 'bold ', [
			'i', {}, 'and italics',
		],
	] ],
	'.',
]

@spocke
Copy link
Author

spocke commented Dec 15, 2017

So we could provide a way in tinymce to set the tree using editor.setContent(tree, { format: 'tree' }) then provide in a PR to gutenberg functions that convert from this suggested format that you would want to and from the internal tree representation in tinymce we could do a separate PR for that if you guys feel that it's a good idea. This PR was more to get the ball rolling in term of having the filtering being done by tiny but the storage done by gutenberg.

@gziolo
Copy link
Member

gziolo commented Dec 15, 2017

@aduth, @iseulde, should we iterate on this approach rather then continue efforts on #3048? Do you see any drawbacks assuming we store a plain JS data in Redux store?

@aduth
Copy link
Member

aduth commented Dec 18, 2017

One potential issue is that we'd want to still be able to set the content of the Editable prior to TinyMCE initializing to avoid content flashing in after the editor loads. See existing logic:

// If a default value is provided, render it into the DOM even before
// TinyMCE finishes initializing. This avoids a short delay by allowing
// us to show and focus the content before it's truly ready to edit.
let children;
if ( defaultValue ) {
children = Children.toArray( defaultValue );
}

Which means we'd still need a way to convert from the plain JS shape to a React element (or dangerously set inner HTML).

@spocke
Copy link
Author

spocke commented Dec 19, 2017

Converting the json format that was suggested to html or dom might a solution. However on editor initialization isn’t the content already a html string since from what I understand it’s stored as html in the db.

@aduth
Copy link
Member

aduth commented Jun 27, 2018

Brain dump:

This pull request is probably beyond salvaging at this point, but I've recently started coming back around to the idea that it'd be in our interest to use TinyMCE's tree format as the base source from which to retrieve content and convert it into the block attribute value shape. I'd explored some of this as well in #6467, where the main hiccup was in that we'd still need some filtering to handle the case of splitting blocks. However, I think if we refactored the implementation of RichText to use the default forced_root_block behavior, we could instead split blocks on TinyMCE's NewBlock event, call getContent( { format: 'tree' } ), and expect tree.firstChild and tree.lastChild to be the respective equivalents of our current before and after fragments. There's probably a decent amount of effort involved here to avoid breakage with multiline and tagName behaviors on RichText. Also, it may require a new wrapper div to use in place of the default tagName for the TinyMCE component, since TinyMCE's enter behavior won't allow the new block behavior to be triggered when the root element is the p; this is expected: rather, we want a div for what TinyMCE to consider as its root and its first child as what we consider as the root for the block attribute value.

@aduth aduth closed this Jun 27, 2018
@aduth aduth deleted the try/tinymce-tree branch June 27, 2018 15:12
@aduth
Copy link
Member

aduth commented Jun 28, 2018

FYI I explored using getContent( { format: 'tree' } ) in #7583 and considered it a failed experiment. More information at that pull request. Namely, I would have hoped TinyMCE would construct its tree by walking the actual editor DOM, but in fact it runs the parse on the body's innerHTML, which is prohibitively more expensive an operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Rich Text Related to the Rich Text component that allows developers to render a contenteditable
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants