Skip to content

Commit

Permalink
fix: gumbo memory leak on abandoned tags (#3036)
Browse files Browse the repository at this point in the history
**What problem is this PR intended to solve?**

The fuzzer in #3007 found a memory leak when incomplete tags are
encountered.

The repro added to the memory leak test suite is:

```html
<asdfasdfasdfasdfasdfasdfasdfasdfasdfasdf foo="bar
```

which will leak the tag name without this fix.

cc @stevecheckoway 

**Have you included adequate test coverage?**

Yes, but added it to the memory leak suite which can only be run
manually today (see #1603 for context on plans to improve it).

**Does this change affect the behavior of either the C or the Java
implementations?**

No functional change.
  • Loading branch information
flavorjones authored Nov 21, 2023
2 parents 0aab928 + 84f1706 commit fd9f9e1
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 1 deletion.
1 change: 1 addition & 0 deletions gumbo-parser/src/tokenizer.c
Original file line number Diff line number Diff line change
Expand Up @@ -506,6 +506,7 @@ static void abandon_current_tag(GumboParser* parser) {
for (unsigned int i = 0; i < tag_state->_attributes.length; ++i) {
gumbo_destroy_attribute(tag_state->_attributes.data[i]);
}
gumbo_free(tag_state->_name);
gumbo_free(tag_state->_attributes.data);
mark_tag_state_as_empty(tag_state);
gumbo_string_buffer_destroy(&tag_state->_buffer);
Expand Down
17 changes: 16 additions & 1 deletion test/test_memory_leak.rb
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,20 @@ def test_leaking_dtd_nodes_after_internal_subset_removal
puts
end
end

describe "libgumbo abandoned tag" do
it "should not leak the tag name" do
html = <<~HTML
<asdfasdfasdfasdfasdfasdfasdfasdfasdfasdf foo="bar
HTML
# should increase over the first 200_000 iterations (general parsing overhead),
# but then flatten out. on my machine at about 169k
1_000_000.times do |j|
Nokogiri::HTML5::Document.parse(html)
printf "%s::%s: (iter %d) %d\n", self.class, __method__, j, MemInfo.rss if j % 20_000 == 0
end
end
end
end # if NOKOGIRI_GC

def test_object_space_memsize_of
Expand Down Expand Up @@ -336,7 +350,8 @@ module MemInfo
rescue
4096
end
STATM_PATH = "/proc/#{Process.pid}/statm"

STATM_PATH = "/proc/self/statm"
STATM_FOUND = File.exist?(STATM_PATH)

def self.rss
Expand Down

0 comments on commit fd9f9e1

Please sign in to comment.