Skip to content

Commit

Permalink
Don't short-circuit on blank text in :empty
Browse files Browse the repository at this point in the history
Fixes #2130

Also clarified the documentation, and reduced GC during the loop by not hitting element.childNodes().

Regressed by #1976
  • Loading branch information
jhy committed Jul 4, 2024
1 parent c3963d4 commit 6de7cfc
Show file tree
Hide file tree
Showing 4 changed files with 38 additions and 15 deletions.
2 changes: 2 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@
thrown. [2114](https://github.com/jhy/jsoup/pull/2114)
* The `:has()` selector did not match correctly when using sibling combinators (like
e.g.: `h1:has(+h2)`). [2137](https://github.com/jhy/jsoup/issues/2137)
* The `:empty` selector incorrectly matched elements that started with a blank text node and were followed by
non-empty nodes, due to an incorrect short-circuit. [2130](https://github.com/jhy/jsoup/issues/2130)

---

Expand Down
29 changes: 15 additions & 14 deletions src/main/java/org/jsoup/select/Evaluator.java
Original file line number Diff line number Diff line change
Expand Up @@ -717,21 +717,22 @@ public String toString() {
}

public static final class IsEmpty extends Evaluator {
@Override
public boolean matches(Element root, Element element) {
List<Node> family = element.childNodes();
for (Node n : family) {
if (n instanceof TextNode)
return ((TextNode)n).isBlank();
if (!(n instanceof Comment || n instanceof XmlDeclaration || n instanceof DocumentType))
return false;
@Override
public boolean matches(Element root, Element el) {
for (Node n = el.firstChild(); n != null; n = n.nextSibling()) {
if (n instanceof TextNode) {
if (!((TextNode) n).isBlank())
return false; // non-blank text: not empty
} else if (!(n instanceof Comment || n instanceof XmlDeclaration || n instanceof DocumentType))
return false; // non "blank" element: not empty
}
return true;
}
@Override
public String toString() {
return ":empty";
}
return true;
}

@Override
public String toString() {
return ":empty";
}
}

/**
Expand Down
2 changes: 1 addition & 1 deletion src/main/java/org/jsoup/select/Selector.java
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@
* <tr><td><code>:last-of-type</code></td><td>elements that are the last sibling of its type in the list of children of its parent element</td><td><code>tr {@literal >} td:last-of-type</code></td></tr>
* <tr><td><code>:only-child</code></td><td>elements that have a parent element and whose parent element have no other element children</td><td></td></tr>
* <tr><td><code>:only-of-type</code></td><td> an element that has a parent element and whose parent element has no other element children with the same expanded element name</td><td></td></tr>
* <tr><td><code>:empty</code></td><td>elements that have no children at all</td><td></td></tr>
* <tr><td><code>:empty</code></td><td>elements that contain no child elements or nodes, with the exception of blank text nodes, comments, XML declarations, and doctype declarations. In other words, it matches elements that are effectively empty of meaningful content.</td><td><code>li:not(:empty)</code></td></tr>
* </table>
*
* <p>A word on using regular expressions in these selectors: depending on the content of the regex, you will need to quote the pattern using <b><code>Pattern.quote("regex")</code></b> for it to parse correctly through both the selector parser and the regex parser. E.g. <code>String query = "div:matches(" + Pattern.quote(regex) + ");"</code>.</p>
Expand Down
20 changes: 20 additions & 0 deletions src/test/java/org/jsoup/select/SelectorTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -1175,6 +1175,26 @@ public void wildcardNamespaceMatchesNoNamespace() {
assertSelectedIds(notEmpty, "4", "5");
}

@Test
public void emptyPseudo() {
// https://github.com/jhy/jsoup/issues/2130
String html = "<ul>" +
" <li id='1'>\n </li>" + // Blank text node only
" <li id='2'></li>" + // No nodes
" <li id='3'><!-- foo --></li>" + // Comment node only
" <li id='4'>One</li>" + // Text node with content
" <li id='5'><span></span></li>" + // Element node
" <li id='6'>\n <span></span></li>" + // Blank text node followed by an element
" <li id='7'><!-- foo --><i></i></li>" + // Comment node with element
"</ul>";
Document doc = Jsoup.parse(html);
Elements empty = doc.select("li:empty");
assertSelectedIds(empty, "1", "2", "3");

Elements notEmpty = doc.select("li:not(:empty)");
assertSelectedIds(notEmpty, "4", "5", "6", "7");
}

@Test public void parentFromSpecifiedDescender() {
// https://github.com/jhy/jsoup/issues/2018
String html = "<ul id=outer><li>Foo</li><li>Bar <ul id=inner><li>Baz</li><li>Qux</li></ul> </li></ul>";
Expand Down

0 comments on commit 6de7cfc

Please sign in to comment.