Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML API: Implement active format reconstruction #6982

Draft
wants to merge 14 commits into
base: trunk
Choose a base branch
from

Conversation

dmsnell
Copy link
Member

@dmsnell dmsnell commented Jul 6, 2024

Trac ticket: Core-61576

Status

I think that the only thing remaining on this one is getting the attributes right for reconstructed nodes, and also adhering to the Noah's Ark rule with a count of three.

We can probably have an inserted node refer to another one, that is, have a virtual node point to another node for reading the attributes.

Description

Adds support for active format reconstruction, which occurs when crossing certain HTML boundaries, such as when entering a new P element which implicitly closed the previous one and all of the formatting elements inside it.

This raises the question what to do when elements are implicitly created. This appears already with the unexpected </p>, which creates an empty P element. next_tag() never finds these elements even though they appear in the breadcrumbs when moving past them.

Tests

- Tests: 1500, Assertions: 1079, Skipped: 421.
+
<p><b>This is bold<p>This is a new paragraph, and it's still bold.
DOM in browser HTML Breadcrumbs
Screenshot 2024-01-13 at 9 29 32 AM Screenshot 2024-01-13 at 9 30 03 AM

Copy link

github-actions bot commented Jul 6, 2024

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@dmsnell dmsnell force-pushed the html-api/improve-active-element-reconstruction branch from a9a2f2d to 9bf1a03 Compare July 6, 2024 23:12
dmsnell added 3 commits July 6, 2024 16:35
Since the HTML Processor started visiting all nodes in a document, both
real and virtual, the breadcrumb accounting became a bit complicated
and it's not entirely clear that it is fully reliable.

In this patch the breadcrumbs are rebuilt separately from the stack of
open elements in order to eliminate the problem of the stateful stack
interactions and the post-hoc event queue.

Breadcrumbs are greatly simplified as a result, and more verifiably
correct, in this construction.
The HTML Processor internally throws an exception when it reaches HTML
that it knows it cannot process, but this exception is not made
available to calling code. It can be useful to extract more knowledge
about why it gave up, especially for debugging purposes.

In this patch, more context is added to the WP_HTML_Unsupported_Exception
and the last exception is made available to calling code, if it asks.
…rithm.

As part of work to add more spec support to the HTML API, this patch fills
out the active format reconstruction algorithm so that more HTML can be
supported in situations requiring that reconstruction, for example, when
a formatting element such as an A tag or a CODE tag is implicitly closed.

See Core-61576
@dmsnell dmsnell force-pushed the html-api/improve-active-element-reconstruction branch from 9bf1a03 to ab1096f Compare July 6, 2024 23:35
sirreal added a commit to sirreal/wordpress-develop that referenced this pull request Jul 16, 2024
Use the method implemented in WordPress#6982
to avoid duplicating the same functionality.
sirreal added a commit to sirreal/wordpress-develop that referenced this pull request Jul 17, 2024
@sirreal
Copy link
Member

sirreal commented Aug 28, 2024

I think the HTML5lib tests will help here. I'll try to make some progress on this.

 FAILURES!
-Tests: 1494, Assertions: 1076, Failures: 2, Skipped: 418.
+Tests: 1494, Assertions: 1103, Failures: 11, Skipped: 391.

Copy link
Member

@sirreal sirreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm seems to be implemented correctly, I've tested and reviewed its behavior compared to the spec 👍

As already noted, it does not behave correctly reproducing attributes in the created nodes. That seems like something we should figure out with virtual tokens. For now, we can likely bail when creating nodes with attributes.

Also already noted, when pushing to the list of active formatting elements, we need to handle the "noah's ark clause." That seems important to handle because it affects the behavior when reconstructing the list of active formatting elements. That may also give insights about how to deal with attributes on generated nodes.

/**
* Returns the node at the given 1-offset index in the list of active formatting elements.
*
* Do not use this method; it is meant to be used only by the HTML Processor.
Copy link
Member

@sirreal sirreal Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to say this? The entire class is marked @access private and considered internal.

Comment on lines +53 to +54
* @access private
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, maybe this is redundant given the entire class has this private tag.

Suggested change
* @access private
*

@sirreal
Copy link
Member

sirreal commented Aug 29, 2024

I'm exploring the Noah's Ark work to limit equivalent elements on the active formatting element stack in dmsnell#19.

@dmsnell
Copy link
Member Author

dmsnell commented Sep 2, 2024

Note from conversation: can we simply bail if we have more than three of the same kind of tag name on the list of active formatting elements (up to the nearest marker) and then defer any format Noah's Ark processing?

this could potentially let us process the most common form of active format reconstruction without implementing the complicated and costly parts of the algorithm.

@dmsnell
Copy link
Member Author

dmsnell commented Sep 2, 2024

Need to report attributes even on virtual reconstructed attributes. Prior to now we haven't actually recreated any formatting elements, but once we start doing so, we have to ensure we don't report that attributes are missing

@sirreal sirreal force-pushed the html-api/improve-active-element-reconstruction branch from 25fa1b7 to 026c3c2 Compare January 3, 2025 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants