Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Modules] Disappointing O(n^2) scaling of compile times with modules and optimizations #60996

Closed
davidstone opened this issue Feb 25, 2023 · 45 comments · Fixed by #71031
Closed
Assignees
Labels
clang:modules C++20 modules and Clang Header Modules slow-compile

Comments

@davidstone
Copy link
Contributor

davidstone commented Feb 25, 2023

Given the following set of translation units

export module a;

#define REPEAT8(x) \
	(x), \
	(x), \
	(x), \
	(x), \
	(x), \
	(x), \
	(x), \
	(x)


export unsigned a() {
	unsigned x = 0;
	REPEAT8(REPEAT8(REPEAT8(REPEAT8(REPEAT8(REPEAT8(++x))))));
	return x;
}
export module b;

import a;

export unsigned b() {
	return a();
}
export module c;

import b;

unsigned c() {
	return b();
}

Compiling with optimizations turned on (say, -O3) takes a few seconds for a, which is reasonable (a is designed to be expensive to compile and optimize in this example). But then it takes a few seconds for b, and a few seconds for c. My expectation was that the compiler would compile and optimize the function a once, when compiling module a, and then compiling the importers of a would be fast. The net effect effect of this problem is that after converting a program to use modules (and not using any module partitions), the last few modules are each taking about 30 minutes to compile, as each high-level module is optimizing almost my entire program serially.

This is effectively quadratic scaling of compile times -- a does all the work of a, b does all the work of a + b, c does all the work of a + b + c, etc.

I understand how we got to this point -- we want the generation of the .pcm file to happen as quickly as possible, because then any downstream consumer that doesn't care about optimizations can get access to that data quickly (just precompiling b took about 300 milliseconds on my machine, and precompiling c took about 15 milliseconds). However, this seems to sell out the common case of compiling your program.

@davidstone davidstone changed the title [Modules] Disappointing quadratic scaling of compile times with modules and optimizations [Modules] Disappointing O(n^2) scaling of compile times with modules and optimizations Feb 25, 2023
@EugeneZelenko EugeneZelenko added clang:modules C++20 modules and Clang Header Modules slow-compile and removed new issue labels Feb 25, 2023
@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2023

@llvm/issue-subscribers-clang-modules

@ChuanqiXu9
Copy link
Member

I tell the reason in #61015. And you can look at https://clang.llvm.org/docs/StandardCPlusPlusModules.html#how-modules-speed-up-compilation if you are interested.

@ChuanqiXu9
Copy link
Member

@davidstone
Copy link
Contributor Author

I don't think this is the solution I'm looking for. I expect that turning off all inlining and IPO of functions from other modules would reduce my performance too much. I just want clang to be smarter about how it does the work.

The solution I'm imagining is that when optimizations are turned on, we would optimize each module and emit an optimized pcm. Then consumers of those modules wouldn't need to run all of the optimization steps on those functions again. In my example code, a is optimized down to the equivalent of return 262144;. If module b could start with that information, then it would compile and optimize quickly, and we would do the optimization work once.

I would expect a little bit of "additional" work because the importing modules would still want to decide whether it was profitable to inline functions from other modules, and if so, run further optimization passes on the inlined function, but I would expect this to reduce my total time still.

@ChuanqiXu9
Copy link
Member

I got your point. It would be best to have an optimized pcm. But given the current status of modules, we may wait much longer for that. So D144844 is not the solution and I won't close this one if that is accepted. This should be the long-existing issue.

@davidstone
Copy link
Contributor Author

As a workaround, is there a way to say "Just emit a function call and don't even try looking inside"? I've tried [[gnu::noinline]] and that didn't seem to do anything. I have several modules that are just composing the rest of the my program, and these modules take several minutes each even though they are ~40 lines of code and aren't instantiating any templates. It would be nice if I could annotate the functions they call in the interim.

@davidstone
Copy link
Contributor Author

This feels like something that module : private; could solve, but it looks like that's not implemented yet.

@iains
Copy link
Contributor

iains commented Oct 3, 2023

This feels like something that module : private; could solve, but it looks like that's not implemented yet.

I'm not quite sure what you mean by "not implemented" (I think it is, but the underlying issue [that we emit the function bodies to module interfaces] would be no different).

Also, the use of :private constrains the module to be comprised of only one TU - will that still allow for your example?

My impression is that the underlying issue is that, because of our current pipeline, module interfaces carry the whole payload of the AST (as required to generate objects), this includes function bodies (although we only really need the declaration in the consumers).

I have draft patches to split the module output from the code-gen so that we can (for example) drop non-inline function bodies from interfaces. I suspect that would fix this specific issue (although it would return if you make a() inline).

@davidstone
Copy link
Contributor Author

Oh, I must have made a typo when I was trying out module : private; earlier today, I see it working now when I test again. I could probably also work around this problem by using module partitions for some of my more problematic modules.

My expectation for how the private module fragment would interact with my example is that module a could instead be defined as

export module a;

#define REPEAT8(x) \
	(x), \
	(x), \
	(x), \
	(x), \
	(x), \
	(x), \
	(x), \
	(x)


export unsigned a();

module : private;

unsigned a() {
	unsigned x = 0;
	REPEAT8(REPEAT8(REPEAT8(REPEAT8(REPEAT8(REPEAT8(++x))))));
	return x;
}

And then all that importers of module a would get is the function signature, thereby preventing the optimizer from even being able to run on the contents of a(). Does this expectation line up with the intended implementation in clang?

That being said, all of this discussion about private module fragments and noinlne is, in my mind, a workaround for the underlying problem that I outline in the previous parts of this bug. My ideal is that all of my functions are visible to the optimizer (either by adding inline to a bunch of functions in line with https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1498r1.html or by passing some optimization flag saying "inline across module boundaries by default"). However, I would want the work of optimization to be done once per function, when compiling the module that defines the function. Right now, the optimizer runs on the functions defined in modules I imported, even though the optimizer has already optimized those functions. The optimizer might run again because of inlining or other IPA passes, and that's probably fine, but I don't want to see the current O(n^2) scaling.

@iains
Copy link
Contributor

iains commented Oct 3, 2023

I do not think (from memory) that (at the optimisation stage) the compiler can see any difference between a used function body in the private module fragment and one elsewhere - so I think that the logical hiding of the entity would not prevent it being processed by the optimiser.

So, I accept that it's only a partial solution (to omit the function bodies for non-inline functions in module interfaces) but it is one that could work for some cases. As you say, module partitions are also a way of hiding implementations.

In order to achieve the "only optimise once, but allow for LTO, and inline fns" I'd think we'd need to invent a module file format that included the optimised IR along with the module interface AST - (in a similar manner to the LTO IR + object code).

We would need to juggle the use of interfaces in the parsing / sema and pre-optimised IR in the code-gen / LTO phases - seems like something that could be plausible, but needs some careful design.

As with all things engineering - that is also in tension with the desire to make module interface files smaller (since they have to be distributed to consumers in distributed builds).

A crazier thought might be some kind of AST pre-linking.

@ChuanqiXu9
Copy link
Member

As a workaround, is there a way to say "Just emit a function call and don't even try looking inside"? I've tried [[gnu::noinline]] and that didn't seem to do anything. I have several modules that are just composing the rest of the my program, and these modules take several minutes each even though they are ~40 lines of code and aren't instantiating any templates. It would be nice if I could annotate the functions they call in the interim.

It sounds like a good idea to prevent importing the function bodies for the noninline functions. I'll try to address this comment.


I agree we need a new format to solve the root issue and it needs a careful design indeed.

ChuanqiXu9 added a commit to ChuanqiXu9/llvm-project that referenced this issue Oct 8, 2023
…ailabl externally

A workaround for llvm#60996

As the title suggested, we can avoid emitting available externally
functions which is marked as noinline already. Such functions should
contribute nothing for optimizations.

The update for docs will be sent seperately if this god approved.
@ChuanqiXu9
Copy link
Member

I sent #68501. @davidstone I meant to add you as a reviewer too but I don't know why I can't find you.

@dwblaikie
Copy link
Collaborator

I don't think we should worry about this, really.

I get that people don't generally understand the difference between frontend and middle/backend work in the compiler - and I don't think we shouldn't consider pre-optimized IR in modules, etc (& understand that's complex/longer-term).

(other feedback on #68501)

@ChuanqiXu9
Copy link
Member

The issue is that a lot people image they can get a speedup simply by converting to modules. (I know this is not true unconditionally). And it is indeed disappointing that people found the compilation time increases arbitrarily with optimization enabled.

@iains
Copy link
Contributor

iains commented Oct 9, 2023

I can see several aspects to this;
certainly, we expect end users to treat the compiler as a black-box that does what they expect
the industry can offer advice on good practice, if what they expect is unreasonable
there could be a coding model in which (with modules) there were no longer any sources that were actual or notional headers - but rather the compilation process is expected to produce the interface and any required object file as two artefacts (in this case, they could expect reasonable performance from arbitrary input).

from my PoV, there are more pressing issues right now - i.e. making sure we are complete and correct (but performance is an expectation for modules too).

@davidstone
Copy link
Contributor Author

davidstone commented Oct 9, 2023

My evaluation of the priority of remaining modules work, in order, is approximately:

  1. [C++20][Modules] Can't use same legacy header in different modules #60027 (I expect everyone trying modules on a medium-large project to run into this issue at some point, and the problem / fix is not obvious)
  2. [C++20] [Modules] The constructor of lambda is broken after deserialization #59513
  3. import std;
  4. Compile time performance (mostly this issue)
  5. Header units
  6. Other stuff

With the current state things are in, I'd actually put this issue above import std; in terms of its importance to me, but that's just because https://gitlab.kitware.com/cmake/cmake/-/issues/25311 and https://gitlab.kitware.com/cmake/cmake/-/issues/25312 prevent me from working around this issue in any useful way and that's a CMake issue not clang.

@dwblaikie
Copy link
Collaborator

from my PoV, there are more pressing issues right now - i.e. making sure we are complete and correct (but performance is an expectation for modules too).

Yeah, I think this is roughly where I am. We should encourage/educate people to treat module interface units as still being like headers - don't put code in them you don't need to, it will harm build performance both in the ways described in this bug and even if all those issues are fixed (complicated, long project) there'd still be problems with this model as it'd reduce parallelism by having to do a lot of work to parse modules before being able to build their consumers.

With the current state things are in, I'd actually put this issue above import std; in terms of its importance to me, but that's just because https://gitlab.kitware.com/cmake/cmake/-/issues/25311 and https://gitlab.kitware.com/cmake/cmake/-/issues/25312 prevent me from working around this issue in any useful way and that's a CMake issue not clang.

at a glance, it sounds like this is the "not modularizing bottom up" - which is a problem/really something you shouldn't do.

Basically if you're going to depend on non-modular code from modular code, basically you have to wrap the non-modular code in a modular form /once and only once/ and then use it from all the modules you want. Duplicating it into multiple modules is going to be real bad.

@davidstone
Copy link
Contributor Author

From CMake's perspective, a module implementation unit is "not a module". So if I try to create an interface + implementation pair, my compilation is significantly slower as the implementation is compiled many times.

@JohelEGP
Copy link

JohelEGP commented Oct 9, 2023

FWIW, I deal with expensive headers by creating shim modules.
Importing a shim module is faster than preprocessing the #includes.
So whenever I change my root module that triggers recompilation of all my modules,
the shim modules don't need to be recompiled
(and thus the #includes don't need to be preprocessed again).
Might you be able to do the same by relinquishing your expensive functions back to headers?

The shim modules
module;

#include <boost/geometry.hpp>

export module jegp.boost_geometry_exports;

namespace jegp::geo::bg {

export using boost::geometry::area;
export using boost::geometry::distance;
export using boost::geometry::multiply_point;
export using boost::geometry::divide_point;

} // namespace jegp::geo::bg
module;

#include <imgui-SFML.h>
#include <imgui.h>
#include <misc/cpp/imgui_stdlib.h>

export module waarudo.imgui_exports;

namespace waarudo::ImGui {

export using ::ImGui::Checkbox;
export using ::ImGui::DragInt;
export using ::ImGui::SameLine;
export using ::ImGui::SeparatorText;
export using ::ImGui::TextUnformatted;

} // namespace waarudo::ImGui

namespace waarudo::ImGui::SFML {

export using ::ImGui::SFML::Init;
export using ::ImGui::SFML::ProcessEvent;
export using ::ImGui::SFML::Render;
export using ::ImGui::SFML::Shutdown;
export using ::ImGui::SFML::Update;

} // namespace waarudo::ImGui::SFML
module;

#include <boost/mp11/algorithm.hpp>
#include <boost/mp11/list.hpp>

export module waarudo.mp11_exports;

namespace waarudo::boost::mp11 {

export using ::boost::mp11::mp_for_each;
export using ::boost::mp11::mp_list;

} // namespace waarudo::boost::mp11
module;

#include <SFML/Graphics.hpp>

export module waarudo.sfml_exports;

namespace waarudo::sf {

export using ::sf::Color;
export using ::sf::Event;
export using ::sf::Image;
export using ::sf::PrimitiveType;
export using ::sf::Rect;
export using ::sf::RectangleShape;
export using ::sf::RenderTexture;
export using ::sf::RenderWindow;
export using ::sf::Sprite;
export using ::sf::Texture;
export using ::sf::Vector2;
export using ::sf::Vector2i;
export using ::sf::Vector2u;
export using ::sf::Vertex;
export using ::sf::VertexArray;
export using ::sf::VideoMode;
export using ::sf::View;

namespace Keyboard {
  export using ::sf::Keyboard::Key;
  export using enum Key;
  export using ::sf::Keyboard::isKeyPressed;
} // namespace Keyboard

namespace Style {
  export using ::sf::Style::Fullscreen;
} // namespace Style

} // namespace waarudo::sf

@ChuanqiXu9
Copy link
Member

and we're still going to have to teach people not to put all their stuff in module interface units because it will hurt build performance no matter even after we make changes like we've discussed to improve the situation

This is point that I don't understand/agree. I think it should be fine to put things in the module interface units after we make the changes we discussed. It is true that putting things in the module interface has costs. But the cost is linear (if we made what we want) instead of O(n^2). So it should be an acceptable abstraction.

@ChuanqiXu9
Copy link
Member

ChuanqiXu9 commented Oct 30, 2023

Now I changed my mind slightly. Now I feel it may be acceptable to not import non-inline function bodies with optimizations. Since I found more people feel surprised to see the compilation time goes up significantly with modules + optimizations.

For people who care this really, we can offer an option like -fimport-function-bodies-across-module-units to keep the current behavior.

This is also helpful for smaller PCM like @iains suggested (eliminate all the non-inline function bodies in the BMI).

How do you feel about the direction? @davidstone @iains @dwblaikie @JohelEGP

(assume we can't put optimized IR in the BMI in the near future)

@dwblaikie
Copy link
Collaborator

I still think we should just be encouraging people to think of the performance of module interface units as more like headers and less like separate translation units. They expose things to other translation units, etc. Keeping them small, and keeping implementation details in implementation units is important.

I think it's unfortunate if we add different flags and modes (no importing of non-inline entities, importing via IR embedded in a module file, etc).

Guess one question is what do other implementations (GCC, MSVC) do/what are they considering doing in this area?

@iains
Copy link
Contributor

iains commented Oct 30, 2023

I still think we should just be encouraging people to think of the performance of module interface units as more like headers and less like separate translation units. They expose things to other translation units, etc. Keeping them small, and keeping implementation details in implementation units is important.

I think it's unfortunate if we add different flags and modes (no importing of non-inline entities, importing via IR embedded in a module file, etc).

I agree that we should decide what to do and make it happen without special flags (I said at some point in this overall discussion; it is reasonable to hang different behaviours off -On with faster compilation for O0 and better performance for O>0, with some compile-time penalty).

Guess one question is what do other implementations (GCC, MSVC) do/what are they considering doing in this area?

I'd need to double-check, but AFAIR for GCC the BMI is built from the symbol table, so that it does not contain anything that is not required - since the object-generation and the BMI are already separate paths there.

@ChuanqiXu9
Copy link
Member

I still think we should just be encouraging people to think of the performance of module interface units as more like headers and less like separate translation units. They expose things to other translation units, etc. Keeping them small, and keeping implementation details in implementation units is important.

But it is not the case. Named modules do have different semantics than headers. And if we treat named modules as headers, what is the position for named modules? Then the named modules will become a standardized/explicit PCH. Maybe it is still useful. But it is not attracting any more.

it is reasonable to hang different behaviours off -On with faster compilation for O0 and better performance for O>0, with some compile-time penalty

This is the current behavior.

Guess one question is what do other implementations (GCC, MSVC) do/what are they considering doing in this area?
I'd need to double-check, but AFAIR for GCC the BMI is built from the symbol table, so that it does not contain anything that is not required - since the object-generation and the BMI are already separate paths there.

Yeah, this is important. Let's ask Cameron for the case of MSVC.

@koplas
Copy link

koplas commented Oct 31, 2023

Hiding the function body could also solve #61447. Consider this module:

module;

#include <iostream>

export module a;

export void hello() {
    std::cout << "Hello\n";
}

If the function body of hello is hidden it is possible to discard all declarations in <iostream> and make importing the module of a nearly a no-op. This allows to only merge declarations that are reachable in the interface. For example, if we have a void hello(std::string s) the declarations for std::string have to be included, but not the whole content of <iostream>.

@ChuanqiXu9
Copy link
Member

Hiding the function body could also solve #61447. Consider this module:

module;

#include <iostream>

export module a;

export void hello() {
    std::cout << "Hello\n";
}

If the function body of hello is hidden it is possible to discard all declarations in <iostream> and make importing the module of a nearly a no-op. This allows to only merge declarations that are reachable in the interface. For example, if we have a void hello(std::string s) the declarations for std::string have to be included, but not the whole content of <iostream>.

Good idea but it probably can't work. Due to we can't discard things we like. See http://eel.is/c++draft/module.global.frag#3 (not implemented yet). There are description about expressions appear in ....

@rnikander
Copy link

rnikander commented Oct 31, 2023

I still think we should just be encouraging people to think of the performance of module interface units as more like headers [...] Keeping them small, and keeping implementation details in implementation units is important.

I wanted to do that, but #70585 prevents me from moving parts of my class into a private implementation unit.

@ChuanqiXu9
Copy link
Member

According the discussion in the SG15, we shouldn't import function bodies even with optimization enabled, it breaks the ABI compatibility.

I'll try to disable to import function bodies from modules. And the further discussed reduced BMI will be addressed later.

@jyknight
Copy link
Member

jyknight commented Nov 4, 2023

I'm late to this discussion, but I strongly disagree with:

I still think we should just be encouraging people to think of the performance of module interface units as more like headers

I think a big win for developers from Modules will be to allow them to write a single file module interface+implementation, and have that be roughly-equivalent in semantics to a separate header + implementation file, AND have equal or better performance. This should be feasible to implement, and it would be a huge shame to not actually realize this gain.

We should generate binary module interface files that are the equivalent of what a user would've previously written in a header file. That should be the goal of this whole mess.

So, from there, it's clear that treating everything as if it was implicitly inline, just because it was written in a module interface, is exactly the wrong thing to do. So I definitely agree with the conclusion on that point.

I'm not sure the PR linked here is the right answer, however. It is treating one symptom, but the problem remains that we're writing way too much into the BMI -- way more than should be. That has all sorts of knock-on issues with performance and caching and such, too. Does it make sense to add hacks like that, instead of actually solving the underlying problem of splitting the BMI and object file generation?

@iains
Copy link
Contributor

iains commented Nov 4, 2023

I'm not sure the PR linked here is the right answer, however. It is treating one symptom, but the problem remains that we're writing way too much into the BMI -- way more than should be. That has all sorts of knock-on issues with performance and caching and such, too. Does it make sense to add hacks like that, instead of actually solving the underlying problem of splitting the BMI and object file generation?

For the record, there is a patch series under development that will allow separate paths for the code-gen and interfaces, which will open the way to implement what you describe - it's just a case of "finite engineering resources" and things take time.

@ChuanqiXu9
Copy link
Member

@jyknight sorry for that I failed to get your conclusion here. In the first sentence, you said:

I'm late to this discussion, but I strongly disagree with:

I still think we should just be encouraging people to think of the performance of module interface units as more like headers

So I assume you're saying:

the performance of module interface units doesn't like headers

But I can't find the sentence you're saying this. I feel like you're saying things in module interfaces should be more like headers?


it's clear that treating everything as if it was implicitly inline, just because it was written in a module interface, is exactly the wrong thing to do.

We don't treat everything in the module interface as implicitly inline. We don't change the linkage here. We just import the function bodies as available externally for optimizations. This is the problem in the issue report. Although this idea is identified as incorrect since it breaks the ABI compatibility. There is a discussion in the Sg15 mailing list.

I'm not sure the PR linked here is the right answer

It depends on the topic we're discussing about. Since there is already too many things discussing in the current page: run time performance, ABI compatibility, BMI format .... If we're just talking about the run time performance and ABI compatibility, the PR linked should be the correct solution in the short time. (in 1~3 release circles)

. It is treating one symptom, but the problem remains that we're writing way too much into the BMI -- way more than should be. That has all sorts of knock-on issues with performance and caching and such, too.

Yes, and this is the reason why I am saying my above PR should be a short time solution.

Let's call the current BMI, which contains as much information as possible, as fat BMI. And let's call the BMI, which simply contains the necessary information as an interface, as thin BMI.

And I am already push my time to teach the clang to generate thin BMI. My goal of the first step is to not change the thin BMI if we only touches the function bodies of non-inline functions in the module purview. For example,

export module a;
export int a() { return 43; }

the thin BMI shouldn't change if we change the implementation to return 44;.

And for this goal, things going relatively well. Maybe I can send a patch and open a RFC post in the next week or in the next next week.

The note here is that, due to the complexity of serialization, we can't reduce the thin BMI to a very thin format at the first shot. I mean, we need to reduce it from the current fat BMI a lot of times. We can't reach the perfect in a single step. Let's discuss this in my post later.


I'm not sure the PR linked here is the right answer, however. It is treating one symptom, but the problem remains that we're writing way too much into the BMI -- way more than should be. That has all sorts of knock-on issues with performance and caching and such, too. Does it make sense to add hacks like that, instead of actually solving the underlying problem of splitting the BMI and object file generation?

For the record, there is a patch series under development that will allow separate paths for the code-gen and interfaces, which will open the way to implement what you describe - it's just a case of "finite engineering resources" and things take time.

@iains sadly my patch covers your previous one again. I think the new patches may be better since as the Mathias mentioned in the SG15 mailings, we shouldn't prevent the 2 phase compilation with thin BMI. And I think the key point here is not to generate a seperate BMI. The key point here is thin BMI. It is not a lot if we generate a fat BMI seperately. Just that a seperate BMI may be a necessary condition to generate reduced BMI. Hope this is not annoying to you : )

@iains
Copy link
Contributor

iains commented Nov 5, 2023

For the record, there is a patch series under development that will allow separate paths for the code-gen and interfaces, which will open the way to implement what you describe - it's just a case of "finite engineering resources" and things take time.

@iains sadly my patch covers your previous one again. I think the new patches may be better since as the Mathias mentioned in the SG15 mailings, we shouldn't prevent the 2 phase compilation with thin BMI. And I think the key point here is not to generate a seperate BMI. The key point here is thin BMI. It is not a lot if we generate a fat BMI seperately. Just that a seperate BMI may be a necessary condition to generate reduced BMI. Hope this is not annoying to you : )

I'd not a question of "annoying" or otherwise, it is a question of deciding how we want to to work - the patch series I drafted does not generate two BMIs - it operates the same way as GCC - generating an object and a BMI from a single compiler invocation . How "thin" the BMI is would be independent of the object generation and a matter for design decisions.

@ChuanqiXu9
Copy link
Member

For the record, there is a patch series under development that will allow separate paths for the code-gen and interfaces, which will open the way to implement what you describe - it's just a case of "finite engineering resources" and things take time.

@iains sadly my patch covers your previous one again. I think the new patches may be better since as the Mathias mentioned in the SG15 mailings, we shouldn't prevent the 2 phase compilation with thin BMI. And I think the key point here is not to generate a seperate BMI. The key point here is thin BMI. It is not a lot if we generate a fat BMI seperately. Just that a seperate BMI may be a necessary condition to generate reduced BMI. Hope this is not annoying to you : )

I'd not a question of "annoying" or otherwise, it is a question of deciding how we want to to work - the patch series I drafted does not generate two BMIs - it operates the same way as GCC - generating an object and a BMI from a single compiler invocation . How "thin" the BMI is would be independent of the object generation and a matter for design decisions.

IIUC, my patch won't be a problem for your following works. I think what you want is to generate the thin BMI and the object file in one shot. If it is the case, then no problems. The command line interfaces would be:

clang++ -std=c++20 a.cppm -fthin-BMI-output=a.pcm -c -o a.o

And if we want 2 phase compilation, we can do:

clang++ -std=c++20 a.cppm -fthin-BMI-output=a.pcm  --precompile -o a.fat.pcm
clang++ -std=c++20 a.fat.pcm -c -o a.o

There is no conflict.

@jyknight
Copy link
Member

jyknight commented Nov 6, 2023

I feel like you're saying things in module interfaces should be more like headers?

I was trying to say that, in my opinion, C++20 modules NOT requiring users to write a separate interface file will be a major developer productivity win, compared with the traditional C header+implementation scheme.

In order to realize that developer productivity win, we need to ensure that the compiler implementation doesn't pessimize the compile-time, for code written with a single file that contains both interface and implementation.

We don't treat everything in the module interface as implicitly inline. We don't change the linkage here. We just import the function bodies as available externally for optimizations.

Right. Sorry, my wording was imprecise, and thus confusing. I didn't mean to suggest that the linkage was changed. I meant "implicitly inlineable" [because it was emitted as an externally-available definition in the BMI], not "implicitly inline".

If we're just talking about the run time performance and ABI compatibility, the PR linked should be the correct solution in the short time. (in 1~3 release circles)

OK; short-term fixes are certainly fine -- as long as there's alignment on the longer-term goal.

@ChuanqiXu9
Copy link
Member

Got it. It looks like all of us are on the same page now : ) Great.

C++20 modules NOT requiring users to write a separate interface file will be a major developer productivity win

I assume you're saying a separate implementation file.

@boris-kolpackov
Copy link
Contributor

The key point here is thin BMI. It is not a lot if we generate a fat BMI seperately. Just that a seperate BMI may be a necessary condition to generate reduced BMI. Hope this is not annoying to you : )

It will definitely be annoying to build system authors to now deal with two BMI files, especially seeing that other compilers (GCC, MSVC) manage to achieve the same with a single BMI file. So my vote is, please, don't.

@boris-kolpackov
Copy link
Contributor

boris-kolpackov commented Nov 6, 2023

FWIW, I've tested the original example in this issue with GCC 13. On my machine, while module a takes ~850ms to compile with -O3, both b and c compile in the single-digit milliseconds range.

@ChuanqiXu9
Copy link
Member

The key point here is thin BMI. It is not a lot if we generate a fat BMI seperately. Just that a seperate BMI may be a necessary condition to generate reduced BMI. Hope this is not annoying to you : )

It will definitely be annoying to build system authors to now deal with two BMI files, especially seeing that other compilers (GCC, MSVC) manage to achieve the same with a single BMI file. So my vote is, please, don't.

In the end of the day, only the thin BMI will be the BMI we're talking about. And the so-called fat BMI is only used to generate the object files. And if the build system don't want to support 2 phase compilation model, the so-called fat BMI is not visible to build system too.

@boris-kolpackov
Copy link
Contributor

boris-kolpackov commented Nov 6, 2023

And if the build system don't want to support 2 phase compilation model, the so-called fat BMI is not visible to build system too.

As I understand it, the only advantage of the 2-phase compilation is to be able to start compiling consumers of the module as soon as the BMI is ready. While we currently don't take advantage of this in build2, we may want to in the future. BTW, with the module mapper you can achieve the same but without the overheads/complications of 2-phase compilation. Specifically, the compiler can notify the build system that the BMI is ready before it continues to generating the object file. GCC does it this way.

@ChuanqiXu9
Copy link
Member

And if the build system don't want to support 2 phase compilation model, the so-called fat BMI is not visible to build system too.

As I understand it, the only advantage of the 2-phase compilation is to be able to start compiling consumers of the module as soon as the BMI is ready. While we currently don't take advantage of this in build2, we may want to in the future. BTW, with the module mapper you can achieve the same but without the overheads/complications of 2-phase compilation. Specifically, the compiler can notify the build system that the BMI is ready before it continues to generating the object file. GCC does it this way.

Understood. But there are more build systems prefer scanner based solutions. Then it looks better to leave the space to implement 2-phase compilation model for them.

ChuanqiXu9 added a commit to ChuanqiXu9/llvm-project that referenced this issue Nov 7, 2023
…s even with optimizations

Close llvm#60996.

Previously, clang will try to import function bodies from other module
units to get more optimization oppotunities as much as possible. Then
the motivation becomes the direct cause of the above issue.

However, according to the discussion in SG15, the behavior of importing
function bodies from other module units breaks the ABI compatibility. It
is unwanted. So the original behavior of clang is incorrect. This patch
choose to not import function bodies from other module units in all
cases to follow the expectation.

Note that the desired optimized BMI idea is discarded too. Since it will
still break the ABI compatibility after we import function bodies
seperately.

The release note will be added seperately.

There is a similar issue for variable definitions. I'll try to handle
that in a different commit.
ChuanqiXu9 added a commit that referenced this issue Nov 7, 2023
…s even with optimizations (#71031)

Close #60996.

Previously, clang will try to import function bodies from other module
units to get more optimization oppotunities as much as possible. Then
the motivation becomes the direct cause of the above issue.

However, according to the discussion in SG15, the behavior of importing
function bodies from other module units breaks the ABI compatibility. It
is unwanted. So the original behavior of clang is incorrect. This patch
choose to not import function bodies from other module units in all
cases to follow the expectation.

Note that the desired optimized BMI idea is discarded too. Since it will
still break the ABI compatibility after we import function bodies
seperately.

The release note will be added seperately.

There is a similar issue for variable definitions. I'll try to handle
that in a different commit.
@ChuanqiXu9
Copy link
Member

While the original issue got closed, many comments here involves BMI formatting. Then https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755 may be interesting to you.

qihangkong pushed a commit to rvgpu/llvm that referenced this issue Apr 18, 2024
…s even with optimizations (#71031)

Close llvm/llvm-project#60996.

Previously, clang will try to import function bodies from other module
units to get more optimization oppotunities as much as possible. Then
the motivation becomes the direct cause of the above issue.

However, according to the discussion in SG15, the behavior of importing
function bodies from other module units breaks the ABI compatibility. It
is unwanted. So the original behavior of clang is incorrect. This patch
choose to not import function bodies from other module units in all
cases to follow the expectation.

Note that the desired optimized BMI idea is discarded too. Since it will
still break the ABI compatibility after we import function bodies
seperately.

The release note will be added seperately.

There is a similar issue for variable definitions. I'll try to handle
that in a different commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules slow-compile
Projects
None yet