-
Notifications
You must be signed in to change notification settings - Fork 12.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Modules] Disappointing O(n^2) scaling of compile times with modules and optimizations #60996
Comments
@llvm/issue-subscribers-clang-modules |
I tell the reason in #61015. And you can look at https://clang.llvm.org/docs/StandardCPlusPlusModules.html#how-modules-speed-up-compilation if you are interested. |
I don't think this is the solution I'm looking for. I expect that turning off all inlining and IPO of functions from other modules would reduce my performance too much. I just want clang to be smarter about how it does the work. The solution I'm imagining is that when optimizations are turned on, we would optimize each module and emit an optimized pcm. Then consumers of those modules wouldn't need to run all of the optimization steps on those functions again. In my example code, I would expect a little bit of "additional" work because the importing modules would still want to decide whether it was profitable to inline functions from other modules, and if so, run further optimization passes on the inlined function, but I would expect this to reduce my total time still. |
I got your point. It would be best to have an optimized pcm. But given the current status of modules, we may wait much longer for that. So D144844 is not the solution and I won't close this one if that is accepted. This should be the long-existing issue. |
As a workaround, is there a way to say "Just emit a function call and don't even try looking inside"? I've tried |
This feels like something that |
I'm not quite sure what you mean by "not implemented" (I think it is, but the underlying issue [that we emit the function bodies to module interfaces] would be no different). Also, the use of :private constrains the module to be comprised of only one TU - will that still allow for your example? My impression is that the underlying issue is that, because of our current pipeline, module interfaces carry the whole payload of the AST (as required to generate objects), this includes function bodies (although we only really need the declaration in the consumers). I have draft patches to split the module output from the code-gen so that we can (for example) drop non-inline function bodies from interfaces. I suspect that would fix this specific issue (although it would return if you make |
Oh, I must have made a typo when I was trying out My expectation for how the private module fragment would interact with my example is that export module a;
#define REPEAT8(x) \
(x), \
(x), \
(x), \
(x), \
(x), \
(x), \
(x), \
(x)
export unsigned a();
module : private;
unsigned a() {
unsigned x = 0;
REPEAT8(REPEAT8(REPEAT8(REPEAT8(REPEAT8(REPEAT8(++x))))));
return x;
} And then all that importers of That being said, all of this discussion about private module fragments and noinlne is, in my mind, a workaround for the underlying problem that I outline in the previous parts of this bug. My ideal is that all of my functions are visible to the optimizer (either by adding |
I do not think (from memory) that (at the optimisation stage) the compiler can see any difference between a used function body in the private module fragment and one elsewhere - so I think that the logical hiding of the entity would not prevent it being processed by the optimiser. So, I accept that it's only a partial solution (to omit the function bodies for non-inline functions in module interfaces) but it is one that could work for some cases. As you say, module partitions are also a way of hiding implementations. In order to achieve the "only optimise once, but allow for LTO, and inline fns" I'd think we'd need to invent a module file format that included the optimised IR along with the module interface AST - (in a similar manner to the LTO IR + object code). We would need to juggle the use of interfaces in the parsing / sema and pre-optimised IR in the code-gen / LTO phases - seems like something that could be plausible, but needs some careful design. As with all things engineering - that is also in tension with the desire to make module interface files smaller (since they have to be distributed to consumers in distributed builds). A crazier thought might be some kind of AST pre-linking. |
It sounds like a good idea to prevent importing the function bodies for the noninline functions. I'll try to address this comment. I agree we need a new format to solve the root issue and it needs a careful design indeed. |
…ailabl externally A workaround for llvm#60996 As the title suggested, we can avoid emitting available externally functions which is marked as noinline already. Such functions should contribute nothing for optimizations. The update for docs will be sent seperately if this god approved.
I sent #68501. @davidstone I meant to add you as a reviewer too but I don't know why I can't find you. |
I don't think we should worry about this, really. I get that people don't generally understand the difference between frontend and middle/backend work in the compiler - and I don't think we shouldn't consider pre-optimized IR in modules, etc (& understand that's complex/longer-term). (other feedback on #68501) |
The issue is that a lot people image they can get a speedup simply by converting to modules. (I know this is not true unconditionally). And it is indeed disappointing that people found the compilation time increases arbitrarily with optimization enabled. |
I can see several aspects to this; from my PoV, there are more pressing issues right now - i.e. making sure we are complete and correct (but performance is an expectation for modules too). |
My evaluation of the priority of remaining modules work, in order, is approximately:
With the current state things are in, I'd actually put this issue above |
Yeah, I think this is roughly where I am. We should encourage/educate people to treat module interface units as still being like headers - don't put code in them you don't need to, it will harm build performance both in the ways described in this bug and even if all those issues are fixed (complicated, long project) there'd still be problems with this model as it'd reduce parallelism by having to do a lot of work to parse modules before being able to build their consumers.
at a glance, it sounds like this is the "not modularizing bottom up" - which is a problem/really something you shouldn't do. Basically if you're going to depend on non-modular code from modular code, basically you have to wrap the non-modular code in a modular form /once and only once/ and then use it from all the modules you want. Duplicating it into multiple modules is going to be real bad. |
From CMake's perspective, a module implementation unit is "not a module". So if I try to create an interface + implementation pair, my compilation is significantly slower as the implementation is compiled many times. |
FWIW, I deal with expensive headers by creating shim modules. The shim modulesmodule;
#include <boost/geometry.hpp>
export module jegp.boost_geometry_exports;
namespace jegp::geo::bg {
export using boost::geometry::area;
export using boost::geometry::distance;
export using boost::geometry::multiply_point;
export using boost::geometry::divide_point;
} // namespace jegp::geo::bg module;
#include <imgui-SFML.h>
#include <imgui.h>
#include <misc/cpp/imgui_stdlib.h>
export module waarudo.imgui_exports;
namespace waarudo::ImGui {
export using ::ImGui::Checkbox;
export using ::ImGui::DragInt;
export using ::ImGui::SameLine;
export using ::ImGui::SeparatorText;
export using ::ImGui::TextUnformatted;
} // namespace waarudo::ImGui
namespace waarudo::ImGui::SFML {
export using ::ImGui::SFML::Init;
export using ::ImGui::SFML::ProcessEvent;
export using ::ImGui::SFML::Render;
export using ::ImGui::SFML::Shutdown;
export using ::ImGui::SFML::Update;
} // namespace waarudo::ImGui::SFML module;
#include <boost/mp11/algorithm.hpp>
#include <boost/mp11/list.hpp>
export module waarudo.mp11_exports;
namespace waarudo::boost::mp11 {
export using ::boost::mp11::mp_for_each;
export using ::boost::mp11::mp_list;
} // namespace waarudo::boost::mp11 module;
#include <SFML/Graphics.hpp>
export module waarudo.sfml_exports;
namespace waarudo::sf {
export using ::sf::Color;
export using ::sf::Event;
export using ::sf::Image;
export using ::sf::PrimitiveType;
export using ::sf::Rect;
export using ::sf::RectangleShape;
export using ::sf::RenderTexture;
export using ::sf::RenderWindow;
export using ::sf::Sprite;
export using ::sf::Texture;
export using ::sf::Vector2;
export using ::sf::Vector2i;
export using ::sf::Vector2u;
export using ::sf::Vertex;
export using ::sf::VertexArray;
export using ::sf::VideoMode;
export using ::sf::View;
namespace Keyboard {
export using ::sf::Keyboard::Key;
export using enum Key;
export using ::sf::Keyboard::isKeyPressed;
} // namespace Keyboard
namespace Style {
export using ::sf::Style::Fullscreen;
} // namespace Style
} // namespace waarudo::sf |
This is point that I don't understand/agree. I think it should be fine to put things in the module interface units after we make the changes we discussed. It is true that putting things in the module interface has costs. But the cost is linear (if we made what we want) instead of O(n^2). So it should be an acceptable abstraction. |
Now I changed my mind slightly. Now I feel it may be acceptable to not import non-inline function bodies with optimizations. Since I found more people feel surprised to see the compilation time goes up significantly with modules + optimizations. For people who care this really, we can offer an option like This is also helpful for smaller PCM like @iains suggested (eliminate all the non-inline function bodies in the BMI). How do you feel about the direction? @davidstone @iains @dwblaikie @JohelEGP (assume we can't put optimized IR in the BMI in the near future) |
I still think we should just be encouraging people to think of the performance of module interface units as more like headers and less like separate translation units. They expose things to other translation units, etc. Keeping them small, and keeping implementation details in implementation units is important. I think it's unfortunate if we add different flags and modes (no importing of non-inline entities, importing via IR embedded in a module file, etc). Guess one question is what do other implementations (GCC, MSVC) do/what are they considering doing in this area? |
I agree that we should decide what to do and make it happen without special flags (I said at some point in this overall discussion; it is reasonable to hang different behaviours off
I'd need to double-check, but AFAIR for GCC the BMI is built from the symbol table, so that it does not contain anything that is not required - since the object-generation and the BMI are already separate paths there. |
But it is not the case. Named modules do have different semantics than headers. And if we treat named modules as headers, what is the position for named modules? Then the named modules will become a standardized/explicit PCH. Maybe it is still useful. But it is not attracting any more.
This is the current behavior.
Yeah, this is important. Let's ask Cameron for the case of MSVC. |
Hiding the function body could also solve #61447. Consider this module: module;
#include <iostream>
export module a;
export void hello() {
std::cout << "Hello\n";
} If the function body of |
Good idea but it probably can't work. Due to we can't discard things we like. See http://eel.is/c++draft/module.global.frag#3 (not implemented yet). There are description about expressions appear in .... |
I wanted to do that, but #70585 prevents me from moving parts of my class into a private implementation unit. |
According the discussion in the SG15, we shouldn't import function bodies even with optimization enabled, it breaks the ABI compatibility. I'll try to disable to import function bodies from modules. And the further discussed reduced BMI will be addressed later. |
I'm late to this discussion, but I strongly disagree with:
I think a big win for developers from Modules will be to allow them to write a single file module interface+implementation, and have that be roughly-equivalent in semantics to a separate header + implementation file, AND have equal or better performance. This should be feasible to implement, and it would be a huge shame to not actually realize this gain. We should generate binary module interface files that are the equivalent of what a user would've previously written in a header file. That should be the goal of this whole mess. So, from there, it's clear that treating everything as if it was implicitly inline, just because it was written in a module interface, is exactly the wrong thing to do. So I definitely agree with the conclusion on that point. I'm not sure the PR linked here is the right answer, however. It is treating one symptom, but the problem remains that we're writing way too much into the BMI -- way more than should be. That has all sorts of knock-on issues with performance and caching and such, too. Does it make sense to add hacks like that, instead of actually solving the underlying problem of splitting the BMI and object file generation? |
For the record, there is a patch series under development that will allow separate paths for the code-gen and interfaces, which will open the way to implement what you describe - it's just a case of "finite engineering resources" and things take time. |
@jyknight sorry for that I failed to get your conclusion here. In the first sentence, you said:
So I assume you're saying:
But I can't find the sentence you're saying this. I feel like you're saying things in module interfaces should be more like headers?
We don't treat everything in the module interface as implicitly inline. We don't change the linkage here. We just import the function bodies as available externally for optimizations. This is the problem in the issue report. Although this idea is identified as incorrect since it breaks the ABI compatibility. There is a discussion in the Sg15 mailing list.
It depends on the topic we're discussing about. Since there is already too many things discussing in the current page: run time performance, ABI compatibility, BMI format .... If we're just talking about the run time performance and ABI compatibility, the PR linked should be the correct solution in the short time. (in 1~3 release circles)
Yes, and this is the reason why I am saying my above PR should be a short time solution. Let's call the current BMI, which contains as much information as possible, as fat BMI. And let's call the BMI, which simply contains the necessary information as an interface, as thin BMI. And I am already push my time to teach the clang to generate thin BMI. My goal of the first step is to not change the thin BMI if we only touches the function bodies of non-inline functions in the module purview. For example, export module a;
export int a() { return 43; } the thin BMI shouldn't change if we change the implementation to And for this goal, things going relatively well. Maybe I can send a patch and open a RFC post in the next week or in the next next week. The note here is that, due to the complexity of serialization, we can't reduce the thin BMI to a very thin format at the first shot. I mean, we need to reduce it from the current fat BMI a lot of times. We can't reach the perfect in a single step. Let's discuss this in my post later.
@iains sadly my patch covers your previous one again. I think the new patches may be better since as the Mathias mentioned in the SG15 mailings, we shouldn't prevent the 2 phase compilation with thin BMI. And I think the key point here is not to generate a seperate BMI. The key point here is thin BMI. It is not a lot if we generate a fat BMI seperately. Just that a seperate BMI may be a necessary condition to generate reduced BMI. Hope this is not annoying to you : ) |
I'd not a question of "annoying" or otherwise, it is a question of deciding how we want to to work - the patch series I drafted does not generate two BMIs - it operates the same way as GCC - generating an object and a BMI from a single compiler invocation . How "thin" the BMI is would be independent of the object generation and a matter for design decisions. |
IIUC, my patch won't be a problem for your following works. I think what you want is to generate the thin BMI and the object file in one shot. If it is the case, then no problems. The command line interfaces would be:
And if we want 2 phase compilation, we can do:
There is no conflict. |
I was trying to say that, in my opinion, C++20 modules NOT requiring users to write a separate interface file will be a major developer productivity win, compared with the traditional C header+implementation scheme. In order to realize that developer productivity win, we need to ensure that the compiler implementation doesn't pessimize the compile-time, for code written with a single file that contains both interface and implementation.
Right. Sorry, my wording was imprecise, and thus confusing. I didn't mean to suggest that the linkage was changed. I meant "implicitly inlineable" [because it was emitted as an externally-available definition in the BMI], not "implicitly inline".
OK; short-term fixes are certainly fine -- as long as there's alignment on the longer-term goal. |
Got it. It looks like all of us are on the same page now : ) Great.
I assume you're saying |
It will definitely be annoying to build system authors to now deal with two BMI files, especially seeing that other compilers (GCC, MSVC) manage to achieve the same with a single BMI file. So my vote is, please, don't. |
FWIW, I've tested the original example in this issue with GCC 13. On my machine, while module |
In the end of the day, only the thin BMI will be the BMI we're talking about. And the so-called fat BMI is only used to generate the object files. And if the build system don't want to support 2 phase compilation model, the so-called fat BMI is not visible to build system too. |
As I understand it, the only advantage of the 2-phase compilation is to be able to start compiling consumers of the module as soon as the BMI is ready. While we currently don't take advantage of this in |
Understood. But there are more build systems prefer scanner based solutions. Then it looks better to leave the space to implement 2-phase compilation model for them. |
…s even with optimizations Close llvm#60996. Previously, clang will try to import function bodies from other module units to get more optimization oppotunities as much as possible. Then the motivation becomes the direct cause of the above issue. However, according to the discussion in SG15, the behavior of importing function bodies from other module units breaks the ABI compatibility. It is unwanted. So the original behavior of clang is incorrect. This patch choose to not import function bodies from other module units in all cases to follow the expectation. Note that the desired optimized BMI idea is discarded too. Since it will still break the ABI compatibility after we import function bodies seperately. The release note will be added seperately. There is a similar issue for variable definitions. I'll try to handle that in a different commit.
…s even with optimizations (#71031) Close #60996. Previously, clang will try to import function bodies from other module units to get more optimization oppotunities as much as possible. Then the motivation becomes the direct cause of the above issue. However, according to the discussion in SG15, the behavior of importing function bodies from other module units breaks the ABI compatibility. It is unwanted. So the original behavior of clang is incorrect. This patch choose to not import function bodies from other module units in all cases to follow the expectation. Note that the desired optimized BMI idea is discarded too. Since it will still break the ABI compatibility after we import function bodies seperately. The release note will be added seperately. There is a similar issue for variable definitions. I'll try to handle that in a different commit.
While the original issue got closed, many comments here involves BMI formatting. Then https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755 may be interesting to you. |
…s even with optimizations (#71031) Close llvm/llvm-project#60996. Previously, clang will try to import function bodies from other module units to get more optimization oppotunities as much as possible. Then the motivation becomes the direct cause of the above issue. However, according to the discussion in SG15, the behavior of importing function bodies from other module units breaks the ABI compatibility. It is unwanted. So the original behavior of clang is incorrect. This patch choose to not import function bodies from other module units in all cases to follow the expectation. Note that the desired optimized BMI idea is discarded too. Since it will still break the ABI compatibility after we import function bodies seperately. The release note will be added seperately. There is a similar issue for variable definitions. I'll try to handle that in a different commit.
Given the following set of translation units
Compiling with optimizations turned on (say,
-O3
) takes a few seconds fora
, which is reasonable (a
is designed to be expensive to compile and optimize in this example). But then it takes a few seconds forb
, and a few seconds forc
. My expectation was that the compiler would compile and optimize the functiona
once, when compilingmodule a
, and then compiling the importers ofa
would be fast. The net effect effect of this problem is that after converting a program to use modules (and not using any module partitions), the last few modules are each taking about 30 minutes to compile, as each high-level module is optimizing almost my entire program serially.This is effectively quadratic scaling of compile times --
a
does all the work ofa
,b
does all the work ofa
+b
,c
does all the work ofa
+b
+c
, etc.I understand how we got to this point -- we want the generation of the .pcm file to happen as quickly as possible, because then any downstream consumer that doesn't care about optimizations can get access to that data quickly (just precompiling
b
took about 300 milliseconds on my machine, and precompilingc
took about 15 milliseconds). However, this seems to sell out the common case of compiling your program.The text was updated successfully, but these errors were encountered: