Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML API: Improve implementation of adoption agency algorithm #6983

Open
wants to merge 19 commits into
base: trunk
Choose a base branch
from

Conversation

dmsnell
Copy link
Member

@dmsnell dmsnell commented Jul 6, 2024

Trac ticket: Core-61576

Status

This still needs work. With the added support, no new tests run. Does it do anything? We need to figure out.

Description

Improves support for the adoption agency algorithm in HTML, which re-parents nodes that appear in a different order than their corresponding tag in HTML, or when encountering certain closing elements for which no opening element exists.

This depends on resolving the issue: what do we do with nodes that are implicitly created after arriving at another tag?

<a><div supported><a unsupported></div></a>
DOM in browser HTML Breadcrumbs
Screenshot 2024-01-13 at 11 00 39 AM Screenshot 2024-01-13 at 11 00 48 AM

html5lib Test Suite

- Tests: 1498, Assertions: 930, Skipped: 568.
+ Tests: 1498, Assertions: 930, Skipped: 568.

Copy link

github-actions bot commented Jul 6, 2024

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

dmsnell added 2 commits July 6, 2024 16:29
Since the HTML Processor started visiting all nodes in a document, both
real and virtual, the breadcrumb accounting became a bit complicated
and it's not entirely clear that it is fully reliable.

In this patch the breadcrumbs are rebuilt separately from the stack of
open elements in order to eliminate the problem of the stateful stack
interactions and the post-hoc event queue.

Breadcrumbs are greatly simplified as a result, and more verifiably
correct, in this construction.
The HTML Processor internally throws an exception when it reaches HTML
that it knows it cannot process, but this exception is not made
available to calling code. It can be useful to extract more knowledge
about why it gave up, especially for debugging purposes.

In this patch, more context is added to the WP_HTML_Unsupported_Exception
and the last exception is made available to calling code, if it asks.
@dmsnell dmsnell force-pushed the html-api/improve-adoption-agency-algorithm branch from f97c923 to 8cbfce1 Compare July 6, 2024 23:29
@dmsnell dmsnell force-pushed the html-api/improve-adoption-agency-algorithm branch from 8cbfce1 to e97f678 Compare July 6, 2024 23:33
dmsnell added 8 commits July 6, 2024 16:35
Since the HTML Processor started visiting all nodes in a document, both
real and virtual, the breadcrumb accounting became a bit complicated
and it's not entirely clear that it is fully reliable.

In this patch the breadcrumbs are rebuilt separately from the stack of
open elements in order to eliminate the problem of the stateful stack
interactions and the post-hoc event queue.

Breadcrumbs are greatly simplified as a result, and more verifiably
correct, in this construction.
The HTML Processor internally throws an exception when it reaches HTML
that it knows it cannot process, but this exception is not made
available to calling code. It can be useful to extract more knowledge
about why it gave up, especially for debugging purposes.

In this patch, more context is added to the WP_HTML_Unsupported_Exception
and the last exception is made available to calling code, if it asks.
…rithm.

As part of work to add more spec support to the HTML API, this patch fills
out the active format reconstruction algorithm so that more HTML can be
supported in situations requiring that reconstruction, for example, when
a formatting element such as an A tag or a CODE tag is implicitly closed.

See Core-61576
@dmsnell dmsnell marked this pull request as ready for review August 27, 2024 03:47
Copy link

github-actions bot commented Aug 27, 2024

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props dmsnell, jonsurrell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@dmsnell
Copy link
Member Author

dmsnell commented Aug 27, 2024

@sirreal mind sharing any thoughts on this? I don't think it adds much other than more descriptive bailing messages, but it does structure the algorithm more and include more comments. my thinking is that this might be useful to guide us when we spend more focus on trying to handle these situations.

Copy link
Member

@sirreal sirreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there adoption cases handled in this PR that weren't handled before? All of the examples I'm trying bail for different reasons.

Now that we have insert_virtual_token can we support some more cases?

@@ -5312,32 +5318,31 @@ public function reset_insertion_mode(): void {
*
* @see https://html.spec.whatwg.org/#adoption-agency-algorithm
*/
private function run_adoption_agency_algorithm(): void {
private function run_adoption_agency_algorithm(): ?string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would need a @return tag about the strings and their meanings.

Copy link
Member

@sirreal sirreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type and phpdoc tweaks on walk_down.

@@ -622,12 +622,23 @@ public function remove_node( WP_HTML_Token $token ): bool {
* see WP_HTML_Open_Elements::walk_up().
*
* @since 6.4.0
* @since 6.7.0 Accepts $below_this_node to start traversal below a given node, if it exists.
*
* @param ?WP_HTML_Token $below_this_node Start traversing below this node, if provided and if the node exists.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix PHPdoc:

Suggested change
* @param ?WP_HTML_Token $below_this_node Start traversing below this node, if provided and if the node exists.
* @param WP_HTML_Token|null $below_this_node Start traversing below this node, if provided and if the node exists.

*/
public function walk_down() {
$count = count( $this->stack );
public function walk_down( $below_this_node = null ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public function walk_down( $below_this_node = null ) {
public function walk_down( WP_HTML_Token $below_this_node = null ) {

@sirreal
Copy link
Member

sirreal commented Aug 30, 2024

This is difficult to interpret and interacts with active format reconstruction in interesting ways. I think my preference would be to work on that first (#6982) then come back to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants