Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compat: Disable wpautop for block-based posts #2806

Merged
merged 3 commits into from
Nov 9, 2017
Merged

Conversation

aduth
Copy link
Member

@aduth aduth commented Sep 27, 2017

Closes #2736

This pull request seeks to disable wpautop for Gutenberg posts. The markup structure of a block is defined as the result of the block implementation's save function, and does not need (and should not have) formatting of automatic paragraph application from wpautop. The changes here disable the_content's filter of wpautop for posts containing blocks.

The challenge here is in managing posts which were originally authored in the Classic editor, then later in Gutenberg to add new content blocks (which will be a common occurrence after Gutenberg is merged).

(See revised behavior: #2806 (comment))

To preserve paragraphs which had previously relied on the behavior of autop, the Gutenberg editor will apply autop during the parse step. This has the result of setting <p> into saved content, instead of relying on this to be applied later during the_content.

This is unlike some of the considerations proposed at #2708, where autop would be applied on a per-block basis in the_content. I am very wary of the performance impact of detecting and individually formatting blocks in content during a front-end filter. In the suggested use-case of an external editor, autop behavior would continue to apply for posts which do not contain blocks. It should never be the case that a post contains both blocks and legacy freeform content without explicit <p> tags if edited in Gutenberg, but it would be true that such an external client would need to pass new saved content in a block post with explicit tags as well. This is not complete compatibility, and it may be worth exploring whether wpautop should also be applied server-side when saving a post containing both blocks and freeform content to capture these instances.

Testing instructions:

  1. Create a new post in the classic editor with a single paragraph of text
  2. Save the post as a draft
  3. Return to the posts list
  4. Edit the post saved in Step 2 in the Gutenberg editor (hover in list, then click Gutenberg)
  5. Add a new paragraph block with some text
  6. Press Preview
  7. Inspect content of preview
  8. Note that both paragraphs are wrapped in <p> tags

@aduth aduth added [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Status] In Progress Tracking issues with work in progress labels Sep 27, 2017
@ellatrix
Copy link
Member

I am very wary of the performance impact of detecting and individually formatting blocks in content during a front-end filter. [...] This is not complete compatibility, and it may be worth exploring whether wpautop should also be applied server-side when saving a post containing both blocks and freeform content to capture these instances.

What about just applying autop on freeform blocks server-side before save?

@ellatrix
Copy link
Member

ellatrix commented Sep 27, 2017

(This might capture two instances, and we wouldn't have to add a JS autop script.)

@aduth
Copy link
Member Author

aduth commented Sep 27, 2017

What about just applying autop on freeform blocks server-side before save?

Yeah, this came to mind as I was writing the pull request description. That might help simplify some things, particularly in the implementation of wpautop in the browser, which I've found to not faithfully follow the server-side variant.

@aduth
Copy link
Member Author

aduth commented Sep 27, 2017

One issue we might encounter is that TinyMCE will apply the paragraph tags to the freeform content, so on the initial edit of a legacy content post in Gutenberg, changes will be flagged even if they don't truly exist (merely focusing then blurring the freeform block). Since any post saved in Gutenberg would then have paragraphs applied (by a server-side filter in recent recommendations), I expect it'd only be an issue for that initial editing session.

This relates to Classic editor running removep on content before saving.

name = name || getUnknownTypeHandlerName();
// If unknown type, use unknown handler and apply paragraph formatting.
if ( undefined === name ) {
rawContent = formatting.autop( rawContent );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, don't we already do that here:

Good call, I did not see this, and it would likely be redundant. An issue with leaving the current implementation is that it needlessly flags the post as dirty because the original value of the block content does not have autop applied, only after setup has completed (in master, focusing and blurring a freeform block without changes will mark the post as needing save).

With ideas floating around server-side autop, I'll consider whether the existing autop call in OldEditor is still necessary.

@aduth
Copy link
Member Author

aduth commented Sep 28, 2017

Reference to original approach, in anticipation of dropping commit during rebase: 392d554

@aduth aduth force-pushed the remove/autop-gutenberg branch 2 times, most recently from 468d000 to 113d161 Compare September 28, 2017 20:39
@aduth
Copy link
Member Author

aduth commented Sep 28, 2017

I've updated this branch with a revised approach, where the client is no longer expected at all to apply autop behavior. Instead, when loading Gutenberg, the content of the post is formatted to apply autop selectively to freeform content only (i.e. content is parsed and autop applied on a block-by-block basis).

To compare:

  • We no longer need to maintain or account for differences in wpautop implementation differences between the client and server, since we rely entirely on the server
  • We could apply this on save, but due to the behavior described at Compat: Disable wpautop for block-based posts #2806 (comment), we would still want autop applied before the editor session begins, either on the server (as implemented currently) or in the client as part of the parse (difficult to faithfully recreate)
    • Even if another client would save with removep, if ever the user reopened the post in Gutenberg, the content should still be effectively the same

Where I hesitate:

  • I don't like that we modify the post entity from the server-side REST call when preparing the post for the editor. Unfortunately, there is already some precedent here in how we reset the Auto Draft title to an empty string. Alternatives here might be: Modifying content during REST filters, creating a separate content property (e.g. Add block data to REST API #2649 (comment)), or bootstrapping the page with a custom object shape for the post (not implying strict adherence to REST API schema).
  • Recreating serialize behavior on the server seems useful, but I'm skeptical how well it mimics the behavior we have in the client: one in particular was slash encoding applied by default in PHP's json_encode, which can be disabled in PHP 5.4+ with the JSON_UNESCAPED_SLASHES flag. Since we can't target 5.4+, I manually recreated this effect with a str_replace. I may need to check whether we need JSON_UNESCAPED_UNICODE here as well.

@aduth aduth removed the [Status] In Progress Tracking issues with work in progress label Sep 28, 2017
$attrs_json = json_encode( $block['attrs'] );

// In PHP 5.4+, we would pass the `JSON_UNESCAPED_SLASHES` option to
// `json_encode`. To support older versions, we must apply manually.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice comment!

Also, check out the JS serializer to see the other transforms we make in order to safeguard the attributes

export function serializeAttributes( attrs ) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, check out the JS serializer to see the other transforms we make in order to safeguard the attributes

Ah, right, will make a pass to port over this behavior and equivalent tests.

lib/blocks.php Outdated
$blocks[ $i ]['rawContent'] = $content;
}

return gutenberg_serialize_blocks( $blocks );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one place where I have concern about using HTML strings as Editable (cc: @iseulde @youknowriad @mtias).

If we could iterate over the text nodes themselves and know that they contain no HTML then it seems like we could guarantee behaviors with autop. In this mixture, however, it feels like we leave the door open to bugs. Maybe it's not too big of a deal. I think < is illegal inside attributes and tags, but not illegal inside comments.

Not sure also how autop works so please read my review with that in mind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure I understand the concern here. With these changes and #2708, we're effectively turning off wpautop for blocks, so it shouldn't be the case that we need to worry about how rich text (via HTML or other structure) is treated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more of a general concern about existing apply_filter() code

lib/register.php Outdated
/**
* Determine whether a content string contains blocks.
*
* @since 1.2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it would be worth adding a note on the limited ability this has to accurately determine blocks? I would think it obvious from looking at the code but maybe someone wouldn't realize this is only a first-order approximation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, could be a good idea. The implementation is dubious, but... do you have in mind any scenarios in which it would not be accurate?

Copy link
Member

@dmsnell dmsnell Oct 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just the rare case where someone writes something like…

This is not a <!-- wp --> block. <!-- wp:comment -->

…or when something odd like that accidentally crops up.

Not really saying we need to go overboard at this point to
detect that stuff, but it would be good to point out that it's
an optimization against parsing the document to find at
least one valid block.


Touch

This
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔨 🕐

@mkaz
Copy link
Member

mkaz commented Oct 2, 2017

Testing this change and I don't get the initial fail state.

With a fresh check out and build from master, following the instructions above to create paragraph in Classic, and then in Gutenberg - both paragraphs get created with <p> tags around them. Editing also shows as expected with Classic Block and Gutenberg Paragraph Block.

I also tested on the branch with the change, and it behaves the same.

@aduth
Copy link
Member Author

aduth commented Oct 4, 2017

Rebased to resolve conflicts, and with a few revisions included in the rebase.

  • The rebased e8d802b ports string replacements for comment escaping from the JavaScript serializer to PHP, with an additional unit test for verifying serialized -> parsed -> serialized equivalence.
  • The rebased 2d2775b expands on the DocBlock to clarify accuracy of gutenberg_content_has_blocks. It could be reasonable to introduce a "strict" flag akin to the third argument of PHP's in_array which would change the logic to running the parser and testing the result, but since we don't have an immediate need for this, it would be best to postpone for a future revision.
  • I added 86a9792 which applies wpautop to freeform content for any save which includes block content. While this is arguably unnecessary and a waste of resources in parsing for saved content coming from the Gutenberg or classic editors, it could improve compatibility of saved content originating from other third-party editors which don't fall under Gutenberg or the classic editor changes from Compat: Disable wpautop in Classic Editor when post contains block #2708
  • I omitted changes to remove the gutenberg_ prefixing from block content functions. We may want to consider going down this path for easier merge, but should be done separately from this pull request.

lib/blocks.php Outdated
function gutenberg_serialize_block( $block ) {
// Return content of unknown block verbatim.
if ( ! isset( $block['blockName'] ) ) {
return $block['rawContent'];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick ear-mark here as I think we're going to probably rename rawContent to something like innerHtml

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick ear-mark here as I think we're going to probably rename rawContent to something like innerHtml

I think that would be good. One idea I had originally was that, in parsing blocks, we could save effort in reserializing them since we should already know the original string text from which the block was parsed. Do you think it could be reasonable to still have something like rawContent returned from the parser which is assigned as the raw text of the block, comments and all?

Then again, there could be value in having the serializer available anyways, and not relying on blocks maybe having this rawContent property available (e.g. not available if created from PHP, only if parsed from text).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my original intention in bringing up rawContent was an exception - that most blocks would return a tree of children but a few blocks which "need" the raw HTML would get it instead of the tree.

I can imagine use cases for the full body of the block but at the same time anticipate that providing as much is more of an enabler for working around the system than with it.

the way the PEG generator works this can be somewhat tricky to provide both - not sure why - which means that it may take more time if we want that to happen (we end up having to carry the rawContent attribute to each level of token and then reduce at the top level to stitch them together)

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not test but I did review the code. Thanks for adding the comments on has_blocky_stuff()!

@aduth aduth force-pushed the remove/autop-gutenberg branch from 86a9792 to ebf95c0 Compare October 27, 2017 12:59
@aduth
Copy link
Member Author

aduth commented Oct 27, 2017

Rebased to resolve conflicts. Would be good to have a few more eyes on this one, particularly after the latest changes.

@aduth aduth force-pushed the remove/autop-gutenberg branch from ebf95c0 to 91424c4 Compare October 27, 2017 13:16
@codecov
Copy link

codecov bot commented Oct 27, 2017

Codecov Report

Merging #2806 into master will not change coverage.
The diff coverage is 0%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2806   +/-   ##
=======================================
  Coverage   31.98%   31.98%           
=======================================
  Files         249      249           
  Lines        6901     6901           
  Branches     1254     1254           
=======================================
  Hits         2207     2207           
  Misses       3945     3945           
  Partials      749      749
Impacted Files Coverage Δ
blocks/library/freeform/old-editor.js 1.51% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 48ccc03...098218d. Read the comment docs.

lib/blocks.php Outdated
@@ -58,6 +58,80 @@ function gutenberg_parse_blocks( $content ) {
}

/**
* Given an array of parsed blocks, returns content string.
*
* @since 1.3.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to update these before merge

@mkaz
Copy link
Member

mkaz commented Oct 27, 2017

Testing out this change it works good, tested with various blocks of test with varying p tags and worked as expected.

However, the issue in #2817 is back when Jetpack Markdown is enabled, I'm reopening that for @pento to take a look

@aduth
Copy link
Member Author

aduth commented Oct 28, 2017

Noting that this branch incidentally resolves the issue where focusing then blurring a freeform block from a post authored in the classic editor would flag the post as needing to be saved.

@aduth
Copy link
Member Author

aduth commented Oct 30, 2017

However, the issue in #2817 is back when Jetpack Markdown is enabled, I'm reopening that for @pento to take a look

By "back", do you mean reintroduced by these changes? Or is this issue present on master as well?

@mkaz
Copy link
Member

mkaz commented Oct 30, 2017

I see the issue still on master and confused by the state of #2817.
Primarily due to @pento comment which was part of 2bfed02

Revert the revert of #2817.

The change he made looks right, so I think he was trying to put back my change that got overwritten.

@mkaz
Copy link
Member

mkaz commented Oct 30, 2017

ok, I'm pretty sure I know why it's not working, the loading of scripts changed from gutenberg.php to lib/load.php and the require for lib/plugin-compat.php was not included.

I'll create a new PR for it.

@aduth aduth force-pushed the remove/autop-gutenberg branch from 91424c4 to 098218d Compare November 9, 2017 17:00
@aduth aduth merged commit 475cd88 into master Nov 9, 2017
@aduth aduth deleted the remove/autop-gutenberg branch November 9, 2017 18:13
aduth added a commit that referenced this pull request Nov 30, 2017
A parsed block has no awareness of where inner blocks exist in its innerHTML, so it cannot safely reserialize.

There are a few options:

- Since we merely skip wpautop for known blocks, we could avoid reserialization and return the block's original HTML verbatim if we had access to its outerHTML. See nylen/phpegjs#3

- Move wpautop behavior for freeform content to the editor client. This may align well with desires to transparently upgrade legacy paragraph content to paragraph blocks. This would also allow the server to avoid any preprocessing before showing a post on the front-end, assuming that the saved content has had wpautop applied already.

Acknowledging that this effectively reverts large parts of #2806
aduth added a commit that referenced this pull request Dec 14, 2017
A parsed block has no awareness of where inner blocks exist in its innerHTML, so it cannot safely reserialize.

There are a few options:

- Since we merely skip wpautop for known blocks, we could avoid reserialization and return the block's original HTML verbatim if we had access to its outerHTML. See nylen/phpegjs#3

- Move wpautop behavior for freeform content to the editor client. This may align well with desires to transparently upgrade legacy paragraph content to paragraph blocks. This would also allow the server to avoid any preprocessing before showing a post on the front-end, assuming that the saved content has had wpautop applied already.

Acknowledging that this effectively reverts large parts of #2806
aduth added a commit that referenced this pull request Jan 5, 2018
A parsed block has no awareness of where inner blocks exist in its innerHTML, so it cannot safely reserialize.

There are a few options:

- Since we merely skip wpautop for known blocks, we could avoid reserialization and return the block's original HTML verbatim if we had access to its outerHTML. See nylen/phpegjs#3

- Move wpautop behavior for freeform content to the editor client. This may align well with desires to transparently upgrade legacy paragraph content to paragraph blocks. This would also allow the server to avoid any preprocessing before showing a post on the front-end, assuming that the saved content has had wpautop applied already.

Acknowledging that this effectively reverts large parts of #2806
aduth added a commit that referenced this pull request Jan 9, 2018
A parsed block has no awareness of where inner blocks exist in its innerHTML, so it cannot safely reserialize.

There are a few options:

- Since we merely skip wpautop for known blocks, we could avoid reserialization and return the block's original HTML verbatim if we had access to its outerHTML. See nylen/phpegjs#3

- Move wpautop behavior for freeform content to the editor client. This may align well with desires to transparently upgrade legacy paragraph content to paragraph blocks. This would also allow the server to avoid any preprocessing before showing a post on the front-end, assuming that the saved content has had wpautop applied already.

Acknowledging that this effectively reverts large parts of #2806
aduth added a commit that referenced this pull request Jan 10, 2018
A parsed block has no awareness of where inner blocks exist in its innerHTML, so it cannot safely reserialize.

There are a few options:

- Since we merely skip wpautop for known blocks, we could avoid reserialization and return the block's original HTML verbatim if we had access to its outerHTML. See nylen/phpegjs#3

- Move wpautop behavior for freeform content to the editor client. This may align well with desires to transparently upgrade legacy paragraph content to paragraph blocks. This would also allow the server to avoid any preprocessing before showing a post on the front-end, assuming that the saved content has had wpautop applied already.

Acknowledging that this effectively reverts large parts of #2806
aduth added a commit that referenced this pull request Jan 20, 2018
A parsed block has no awareness of where inner blocks exist in its innerHTML, so it cannot safely reserialize.

There are a few options:

- Since we merely skip wpautop for known blocks, we could avoid reserialization and return the block's original HTML verbatim if we had access to its outerHTML. See nylen/phpegjs#3

- Move wpautop behavior for freeform content to the editor client. This may align well with desires to transparently upgrade legacy paragraph content to paragraph blocks. This would also allow the server to avoid any preprocessing before showing a post on the front-end, assuming that the saved content has had wpautop applied already.

Acknowledging that this effectively reverts large parts of #2806
aduth added a commit that referenced this pull request Jan 27, 2018
A parsed block has no awareness of where inner blocks exist in its innerHTML, so it cannot safely reserialize.

There are a few options:

- Since we merely skip wpautop for known blocks, we could avoid reserialization and return the block's original HTML verbatim if we had access to its outerHTML. See nylen/phpegjs#3

- Move wpautop behavior for freeform content to the editor client. This may align well with desires to transparently upgrade legacy paragraph content to paragraph blocks. This would also allow the server to avoid any preprocessing before showing a post on the front-end, assuming that the saved content has had wpautop applied already.

Acknowledging that this effectively reverts large parts of #2806
aduth added a commit that referenced this pull request Jan 30, 2018
A parsed block has no awareness of where inner blocks exist in its innerHTML, so it cannot safely reserialize.

There are a few options:

- Since we merely skip wpautop for known blocks, we could avoid reserialization and return the block's original HTML verbatim if we had access to its outerHTML. See nylen/phpegjs#3

- Move wpautop behavior for freeform content to the editor client. This may align well with desires to transparently upgrade legacy paragraph content to paragraph blocks. This would also allow the server to avoid any preprocessing before showing a post on the front-end, assuming that the saved content has had wpautop applied already.

Acknowledging that this effectively reverts large parts of #2806
aduth added a commit that referenced this pull request Jan 30, 2018
* Framework: Drop server-side block serialization, wpautop

A parsed block has no awareness of where inner blocks exist in its innerHTML, so it cannot safely reserialize.

There are a few options:

- Since we merely skip wpautop for known blocks, we could avoid reserialization and return the block's original HTML verbatim if we had access to its outerHTML. See nylen/phpegjs#3

- Move wpautop behavior for freeform content to the editor client. This may align well with desires to transparently upgrade legacy paragraph content to paragraph blocks. This would also allow the server to avoid any preprocessing before showing a post on the front-end, assuming that the saved content has had wpautop applied already.

Acknowledging that this effectively reverts large parts of #2806

* Parser: Apply autop to fallback block content
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants