-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge compile-time regression in beta/nightly #91128
Comments
Assigning priority as discussed in the Zulip thread of the Prioritization Working Group. We assign a @rustbot label -I-prioritize +P-critical |
I've managed to reduce our use case to a small file that reproduces the issue, all the code is accessible in the following repository https://github.com/EmbarkStudios/rustc-compile-time-regress:
There are multiple variants of the Note: I've also tried again the build from #89830, and the compile time is still the same with it, so it's definitely not the same issue. |
Problematic IR for |
Top passes time
Most time is spent in CVP. Though the real problem is that NewPM produces a huge function, with Most likely it's another case of catastrophic inlining, though not the same as the other one (as the mentioned patch indeed doesn't fix this case). |
This also affects swc. With workaround:
Without workaround:
|
As a workaround, we have disabled NewPM by default on 1.57 release. Can someone confirm if this compile-time regression is indeed resolved on 1.57? |
Fwiw, I tested with the pre-release and the problem had disappeared. Don't know if much has changed between the pre-release and the release, though. Sorry, forgot to confirm here when the pre-release happened. (Hi Felix!) |
A slightly cleaned up IR reproducer (still large): https://gist.github.com/nikic/d66abc8901a21594d0798d169eb9d725 I've investigated this in a bit more detail. First, a note on the general design of the inliner, which is is not very robust. LLVM uses a bottom-up inliner where inlining starts at leaf functions and a local inlining decision for each call is made. After the leafs have been inlined the function will grow, and if it is gets large enough, then it will not get inlined into its callee due to cost modelling. This is what generally prevents runaway inlining. However, it does require that the callee is already "maximally inlined" -- if we have X calls Y calls Z and decided not to inline Z into Y, but then after inlining Y into X decide that we now also want to inline Z into X, we can get into these exponential inlining situations. An obvious case where the callee is not maximally inlined is with SCCs (and presumably the reason why it attempts to inline already inlined callees at all), where there is no "bottom" to start from, The inliner solves this by a combination of inlining history and by doing a round of inlining over all functions in an SCC before reprocessing new inlined calls, so that function size increases in a balanced way across the SCC. In this particular case, we run into a different manifestation of this issue: Z is not inlined into Y because the call to Z is considered a cold callsite based on BFI information. But after inlining into X the callsite of Z becomes non-cold. This obviously doesn't make sense and is the result of precision loss. I'm not completely clear on why things degenerate as much as they do though. Fixing the BFI issue (possibly through some rescaling during inlining) should address this particular issue, but it's rather concerning how fragile the inliner is. |
Okay, here's the part that I was missing before (the BFI issue compounds on this, but it's really the root problem): The inlining advisor has a concept of "deferred inlining", where it decides not to inline a call into the caller, because it determines that it would be more profitable to inline the caller into its callers instead. Specifically, this can happen if the caller is an internal function, which means that there is a very large cost bonus if it can be removed entirely. So if we can show that if we don't inline into this caller it's going to be inlined into its callers and then dropped entirely, we may defer. Now, this explicitly goes against the bottom-up inlining approach and causes predictable issues: If you have a tree of calls for which inlining gets deferred, then on reaching the top-level function (where inlining is no longer deferred), we can end up inlining the whole tree, because each individual inlining decision will now look profitable. For example, if we run https://gist.github.com/nikic/ec18c736e69d29614ff268e2977fc491 through |
I ran an experiment to disable deferred inlining entirely (#91703) with results at https://perf.rust-lang.org/compare.html?start=3b263ceb5cb89b6d53b5a03b47ec447c3a7f7765&end=07378cd9e76010238f64ea03d1219774eb60510d. The result looks promising as far as rustc is concerned, in that compile-times drop, and there is not much impact on check/incr-unchanged builds (our proxy for run-time). The worst regression there is "token-stream-stress check" at 1.3% and otherwise it's mostly mildly positive. Unfortunately it doesn't seem like this can be disabled without patching LLVM. At this point I'm pretty convinced that deferred inlining is broken on a fundamental level and should be removed entirely, and any regressions arising from that addressed in some other way. However, I suspect that upstream is going to disagree with that :) For an extra fun test case, https://gist.github.com/nikic/1262b5f7d27278e1b34a190ae10947f5 is less than 100 lines of IR, and produces about 500000 lines of IR when run through |
Upstream review is https://reviews.llvm.org/D115497. |
We have a serious Rust compile-time regression happening on an internal (unfortunately closed-source) crate, happening on both the beta and nightly channels.
Before, compiling the crate would take around 1 minute 30 seconds in automation. Now if I'm using the nightly channel, it seems to take around an hour or so.
Running with
RUSTFLAGS="-Z new-llvm-pass-manager=no"
makes the regression disappear on the nightly channel, so the new pass manager seems to be at fault here.I've tried compiling with a rustc build from #89830, but it didn't make the regression disappear, so it might not be the same inlining issue.
Thanks a bunch to @lqd who suggested disabling the pass manager, and trying the above patch. Pinging LLVM people: @nikic @Mark-Simulacrum. Of course I'm happy to run any other diagnostic and try any tool that could help figure out what's going on here.
The text was updated successfully, but these errors were encountered: