-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Noticeable performance regression since the last LLVM update. #8665
Comments
LLVM may have updated their optimization passes, and we may just need to update what order we're optimizing things in. @thestinger, you mentioned loop vectorization as being a big thing at lower optimization levels, do you know of perhaps any analysis passes that we're missing by default? |
After talking with @thestinger on IRC, we've reached the conclusion that one major change we can make is reworking how the LLVM passes are run. I'm not certain that this is the source of the regression you're seeing, but it's the best one that I can think of related to updating LLVM. After looking into this, we do a large number of things differently than clang
Long story short, investigation into how we're running LLVM passes shows that it probably needs to be reworked and simplified to use more existing LLVM infrastructure and to match clang more closely. I'm still not certain that this is the cause of this regression, but I'll attempt to determine if it is by making these changes. cc @Aatch, you were the one who recently added all the PassManager stuff to |
Thanks for all those investigations! I merged together a few files from my project to give you a better stand-alone example of regression. Here is the example: https://gist.github.com/sebcrozet/6300848 Again, compiled with
With the compiler before LLVM update (revision 67c954e) I get:
|
Sadly my suspicions were not correct. You're example is fantastic though, and I'm very perplexed as to what's going on with them. I did profile your code though, and I get the same 10x slowness with thew new LLVM upgrade. What's very suspicious to me are the top functions in the profile of that code. Sorry, but I haven't figured out yet how to get profiles on OSX not in instruments... Before the llvm upgrades (stage0 today): After #8700 (with llvm upgrades and refactored pass handling) The exact test I ran was the one above, but with the iterations turned up to 1000 to take a bit longer. I'm very confused why the timing information is showing up so massively in the profiles of one, so I decided to run with different code. With this code, I get the following profiles I'm disturbed by the fact that there are |
Transmute showing up means a change in inlining. |
I identified the root cause of this issue, so I'm closing this in favour of #8720. |
Beforehand, it was unclear whether rust was performing the "recommended set" of optimizations provided by LLVM for code. This commit changes the way we run passes to closely mirror that of clang, which in theory does it correctly. The notable changes include: * Passes are no longer explicitly added one by one. This would be difficult to keep up with as LLVM changes and we don't guaranteed always know the best order in which to run passes * Passes are now managed by LLVM's PassManagerBuilder object. This is then used to populate the various pass managers run. * We now run both a FunctionPassManager and a module-wide PassManager. This is what clang does, and I presume that we *may* see a speed boost from the module-wide passes just having to do less work. I have no measured this. * The codegen pass manager has been extracted to its own separate pass manager to not get mixed up with the other passes * All pass managers now include passes for target-specific data layout and analysis passes Some new features include: * You can now print all passes being run with `-Z print-llvm-passes` * When specifying passes via `--passes`, the passes are now appended to the default list of passes instead of overwriting them. * The output of `--passes list` is now generated by LLVM instead of maintaining a list of passes ourselves * Loop vectorization is turned on by default as an optimization pass and can be disabled with `-Z no-vectorize-loops` All of these "copies" of clang are based off their [source code](http://clang.llvm.org/doxygen/BackendUtil_8cpp_source.html) in case anyone is curious what my source is. I was hoping that this would fix #8665, but this does not help the performance issues found there. Hopefully i'll allow us to tweak passes or see what's going on to try to debug that problem.
Beforehand, it was unclear whether rust was performing the "recommended set" of optimizations provided by LLVM for code. This commit changes the way we run passes to closely mirror that of clang, which in theory does it correctly. The notable changes include: * Passes are no longer explicitly added one by one. This would be difficult to keep up with as LLVM changes and we don't guaranteed always know the best order in which to run passes * Passes are now managed by LLVM's PassManagerBuilder object. This is then used to populate the various pass managers run. * We now run both a FunctionPassManager and a module-wide PassManager. This is what clang does, and I presume that we *may* see a speed boost from the module-wide passes just having to do less work. I have no measured this. * The codegen pass manager has been extracted to its own separate pass manager to not get mixed up with the other passes * All pass managers now include passes for target-specific data layout and analysis passes Some new features include: * You can now print all passes being run with `-Z print-llvm-passes` * When specifying passes via `--passes`, the passes are now appended to the default list of passes instead of overwriting them. * The output of `--passes list` is now generated by LLVM instead of maintaining a list of passes ourselves * Loop vectorization is turned on by default as an optimization pass and can be disabled with `-Z no-vectorize-loops` All of these "copies" of clang are based off their [source code](http://clang.llvm.org/doxygen/BackendUtil_8cpp_source.html) in case anyone is curious what my source is. I was hoping that this would fix #8665, but this does not help the performance issues found there. Hopefully i'll allow us to tweak passes or see what's going on to try to debug that problem.
…, r=llogiq Introduce needless_option_take lint - \[x] Followed [lint naming conventions][lint_naming] - \[x] Added passing UI tests (including committed `.stderr` file) - \[x] `cargo test` passes locally - \[x] Executed `cargo dev update_lints` - \[x] Added lint documentation - \[x] Run `cargo dev fmt` Fixes rust-lang#8618 changelog: Introduce [`needless_option_take`] lint
I get a huge performance regression since #8328 landed (revision a8c3fe4) on all my projects. Things are 50 to 75% slower. I’m pretty sure #8328 is in cause since when I revert the compiler to the version right before (revision 67c954e) performances go back to normal.
For what it’s worth, the concerned projects are 100% generic, and rely a lot on cross-crate inlining. They do a lot of numeric computations and array indexing. Sorry if I am a bit vague but I cannot valgrind my projects because my valgrind started to segfault a few days ago (perhaps since the re-enabling of jemalloc)…
I tried to come up with a small bench exhibiting the problem. It is not that significative, but the following shows some noticeable performances regression already:
Compiled with
--opt-level=3
.With the (new) compiler a8c3fe4, I get:
With the (old) compiler 67c954e, I get something more than 10% faster. The asm dump is smaller too:
The text was updated successfully, but these errors were encountered: