Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

each_recursive is extremely slow #134

Closed
makenowjust opened this issue May 31, 2024 · 2 comments · Fixed by #139
Closed

each_recursive is extremely slow #134

makenowjust opened this issue May 31, 2024 · 2 comments · Fixed by #139

Comments

@makenowjust
Copy link
Contributor

REXML::Node#each_recursive is a fundamental operation to traverse XML nodes. In particular, CSS selector uses this method heavily.

Unfortunately, this method is extremely slow. The following Gist contains a tiny (but mostly equivalent) each_recursive implementation and their benchmarks.

https://gist.github.com/makenowjust/b4852a59e53f0c85c740818c75303d5e

And, the below is a result of this script on my laptop (Apple M1 Pro, 14 inch, 32 GB) and Ruby 3.3.2.

$ ruby bench.rb
       user     system      total        real
   each_recursive  2.421999   0.008742   2.430741 (  2.453438)
my_each_recursive  0.057647   0.000049   0.057696 (  0.058718)
$ ruby --yjit bench.rb
       user     system      total        real
   each_recursive  1.613353   0.008227   1.621580 (  1.625374)
my_each_recursive  0.025493   0.000124   0.025617 (  0.025748)

Yes, REXML's each_recursive is ~50x (or ~80x with YJIT) slower than my tiny implementation.

I believe REXML does a lot of extra work, and we can make it faster.

@kou
Copy link
Member

kou commented May 31, 2024

Let's improve performance of it:

  1. Add a benchmark to https://github.com/ruby/rexml/tree/master/benchmark
  2. Improve performance step by step like we do for parser performance

BTW, are you interesting in merging the CSS selector feature to REXML itself?

@makenowjust
Copy link
Contributor Author

@kou Thank you. I will try to improve its performance.

I am also interested in merging the CSS selector feature to REXML.
But, there are some problems and concerns:

  • rexml-css_selector implementation uses the latest Ruby features. We need to downgrade the source (or use a transpiler much like ruby-next.)
  • This implementation is generic. Actually, it has a Prism adapter.
  • Implementation is still incomplete, and it will be changed with high frequency. Therefore, I want to place it where I can get to change it quickly.

I will create a separate issue on this matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants