Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: use -O2 to compile libgumbo and the nokogiri extension #2639

Merged
merged 1 commit into from
Aug 30, 2022

Conversation

flavorjones
Copy link
Member

What problem is this PR intended to solve?

My benchmarks show this generates code that is:

  • 22% faster at HTML5 serialization
  • 93% faster at HTML5 parsing

Note that -O3 generates slightly slower HTML5 serialization
code (see 8220dc7).

Note also that this doesn't change the compiler options used for
libxml2 and libxslt (which includes -O2 already).

Using the following benchmark script:

#! /usr/bin/env ruby
# coding: utf-8

require "bundler/inline"

gemfile do
  source "https://rubygems.org"
  gem "nokogiri", path: "."
  gem "benchmark-ips"
end

require "nokogiri"
require "benchmark/ips"

input = File.read("test/files/tlm.html")
puts "input #{input.length} bytes"

html4_doc = Nokogiri::HTML4::Document.parse(input)
html5_doc = Nokogiri::HTML5::Document.parse(input)

puts RUBY_DESCRIPTION

Benchmark.ips do |x|
  x.time = 10
  x.report("html5 parse") do
    Nokogiri::HTML5::Document.parse(input)
  end
  x.report("html4 parse") do
    Nokogiri::HTML4::Document.parse(input)
  end
  x.compare!
end

Benchmark.ips do |x|
  x.time = 10
  x.report("html5 serialize") do
    html5_doc.to_html
  end
  x.report("html4 serialize") do
    html4_doc.to_html
  end
  x.compare!
end

with default settings on my dev system
(which are -O3 for extension and unspecified for libgumbo):

input 70095 bytes
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Warming up --------------------------------------
html5 parse 12.000 i/100ms
html4 parse 31.000 i/100ms
Calculating -------------------------------------
html5 parse 129.637 (±16.2%) i/s - 1.260k in 10.051475s
html4 parse 355.723 (±21.4%) i/s - 3.441k in 10.104502s

Comparison:
html4 parse: 355.7 i/s
html5 parse: 129.6 i/s - 2.74x (± 0.00) slower

Warming up --------------------------------------
html5 serialize 85.000 i/100ms
html4 serialize 131.000 i/100ms
Calculating -------------------------------------
html5 serialize 843.993 (± 2.4%) i/s - 8.500k in 10.076902s
html4 serialize 1.319k (± 2.9%) i/s - 13.231k in 10.039827s

Comparison:
html4 serialize: 1319.0 i/s
html5 serialize: 844.0 i/s - 1.56x (± 0.00) slower

after enabling -O2 on both gumbo and nokogiri source files:

input 70095 bytes
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Warming up --------------------------------------
html5 parse 21.000 i/100ms
html4 parse 36.000 i/100ms
Calculating -------------------------------------
html5 parse 250.245 (±20.8%) i/s - 2.394k in 10.066381s
html4 parse 371.905 (±20.2%) i/s - 3.600k in 10.025980s

Comparison:
html4 parse: 371.9 i/s
html5 parse: 250.2 i/s - same-ish: difference falls within error

Warming up --------------------------------------
html5 serialize 101.000 i/100ms
html4 serialize 128.000 i/100ms
Calculating -------------------------------------
html5 serialize 1.037k (± 3.3%) i/s - 10.403k in 10.042146s
html4 serialize 1.301k (± 4.2%) i/s - 13.056k in 10.055585s

Comparison:
html4 serialize: 1300.8 i/s
html5 serialize: 1037.2 i/s - 1.25x (± 0.00) slower

My benchmarks show this generates code that is:

- 22% faster at HTML5 serialization
- 93% faster at HTML5 parsing

Note that `-O3` generates slightly slower HTML5 serialization
code (see 8220dc7).

Note also that this doesn't change the compiler options used for
libxml2 and libxslt (which includes `-O2` already).

Using the following benchmark script:

    #! /usr/bin/env ruby
    # coding: utf-8

    require "bundler/inline"

    gemfile do
      source "https://rubygems.org"
      gem "nokogiri", path: "."
      gem "benchmark-ips"
    end

    require "nokogiri"
    require "benchmark/ips"

    input = File.read("test/files/tlm.html")
    puts "input #{input.length} bytes"

    html4_doc = Nokogiri::HTML4::Document.parse(input)
    html5_doc = Nokogiri::HTML5::Document.parse(input)

    puts RUBY_DESCRIPTION

    Benchmark.ips do |x|
      x.time = 10
      x.report("html5 parse") do
        Nokogiri::HTML5::Document.parse(input)
      end
      x.report("html4 parse") do
        Nokogiri::HTML4::Document.parse(input)
      end
      x.compare!
    end

    Benchmark.ips do |x|
      x.time = 10
      x.report("html5 serialize") do
        html5_doc.to_html
      end
      x.report("html4 serialize") do
        html4_doc.to_html
      end
      x.compare!
    end

with default settings on my dev system
(which are `-O3` for extension and unspecified for libgumbo):

> input 70095 bytes
> ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
> Warming up --------------------------------------
>          html5 parse    12.000  i/100ms
>          html4 parse    31.000  i/100ms
> Calculating -------------------------------------
>          html5 parse    129.637  (±16.2%) i/s -      1.260k in  10.051475s
>          html4 parse    355.723  (±21.4%) i/s -      3.441k in  10.104502s
>
> Comparison:
>          html4 parse:      355.7 i/s
>          html5 parse:      129.6 i/s - 2.74x  (± 0.00) slower
>
> Warming up --------------------------------------
>      html5 serialize    85.000  i/100ms
>      html4 serialize   131.000  i/100ms
> Calculating -------------------------------------
>      html5 serialize    843.993  (± 2.4%) i/s -      8.500k in  10.076902s
>      html4 serialize      1.319k (± 2.9%) i/s -     13.231k in  10.039827s
>
> Comparison:
>      html4 serialize:     1319.0 i/s
>      html5 serialize:      844.0 i/s - 1.56x  (± 0.00) slower

after enabling `-O2` on both gumbo and nokogiri source files:

> input 70095 bytes
> ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
> Warming up --------------------------------------
>          html5 parse    21.000  i/100ms
>          html4 parse    36.000  i/100ms
> Calculating -------------------------------------
>          html5 parse    250.245  (±20.8%) i/s -      2.394k in  10.066381s
>          html4 parse    371.905  (±20.2%) i/s -      3.600k in  10.025980s
>
> Comparison:
>          html4 parse:      371.9 i/s
>          html5 parse:      250.2 i/s - same-ish: difference falls within error
>
> Warming up --------------------------------------
>      html5 serialize   101.000  i/100ms
>      html4 serialize   128.000  i/100ms
> Calculating -------------------------------------
>      html5 serialize      1.037k (± 3.3%) i/s -     10.403k in  10.042146s
>      html4 serialize      1.301k (± 4.2%) i/s -     13.056k in  10.055585s
>
> Comparison:
>      html4 serialize:     1300.8 i/s
>      html5 serialize:     1037.2 i/s - 1.25x  (± 0.00) slower
@flavorjones flavorjones force-pushed the flavorjones-compiler-optimize-flags branch from 9e433fc to ece08bb Compare August 28, 2022 17:56
@flavorjones flavorjones added this to the v1.14.0 milestone Aug 28, 2022
@flavorjones flavorjones merged commit 704a5f8 into main Aug 30, 2022
@flavorjones flavorjones deleted the flavorjones-compiler-optimize-flags branch August 30, 2022 16:01
Copy link
Contributor

@stevecheckoway stevecheckoway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I'm a little surprised we weren't already using -O2 or -O3.

@flavorjones
Copy link
Member Author

@stevecheckoway Yeah, this was a rather obvious (in hindsight) missed opportunity.

@choosen
Copy link

choosen commented Feb 3, 2023

@flavorjones Can it be backported to 1.13.x ?

for ruby 2.6.x ?

@flavorjones
Copy link
Member Author

@choosen No, sorry, I won't be backporting this change. Patch releases are intended for security and bug fixes, please see the section in the README on our interpretation of semantic versioning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants