Tracking issue for refactoring the way we represent function call ABIs #119183

RalfJung · 2023-12-21T11:26:10Z

RalfJung · 2024-05-29T14:44:26Z

Looks like fixing #117480 may be blocked on this.

workingjubilee · 2024-10-16T17:38:42Z

I've started work on cleaning up ABI code. The first few PRs are just going to be moving stuff around so that we can actually co-locate as much of the ABI code as possible:

I'll try to implement the MCP proper as I go along.

RalfJung · 2024-10-16T18:16:25Z

Awesome. :-)

If you want feedback on some sketches of what the ABI might look like, feel free to post them here.

…bi, r=jieyouxu,compiler-errors compiler: Move `rustc_target::spec::abi::Abi` to `rustc_abi::ExternAbi` Lift `enum Abi` from its rather odd place in the middle of rustc_target, and make it available again from rustc_abi. You know, the crate where you would expect the enum that describes all the ABIs to be? The platform-neutral ones, at least. This will help further refactoring of how we handle ABIs in the near future[^0]. Rename `Abi` to `ExternAbi` because quite a lot of the compiler overloads the concept of "ABI" enough that the existing name is imprecise and it is often renamed _anyway_. Often this was to avoid conflicts with the *other* type formerly known as `Abi` (now named BackendRepr[^1]), but sometimes it is just for clarity, and this name seems more self-explanatory. It does get reexported, though, using its old name, to reduce the odds of merge-conflicting over the entire tree. All of `ExternAbi`'s friends come along for the ride, which costs adding some optional dependencies to the rustc_abi crate. However, all of this also allows simply moving three crates entirely off rustc_target: - rustc_hir_pretty - rustc_lint_defs - rustc_mir_build This odd selection is mostly to demonstrate a secondary motivation: The majority of the front-end of the compiler should be as target-agnostic as possible, and it is easier to assure this if they simply don't depend on the crate that describes targets. Note that I didn't migrate crates that don't benefit from it in this way yet, and I didn't survey every last crate. [^0]: This is being undertaken as part of rust-lang#119183 [^1]: rust-lang#132246

Rollup merge of rust-lang#132385 - workingjubilee:move-abi-to-rustc-abi, r=jieyouxu,compiler-errors compiler: Move `rustc_target::spec::abi::Abi` to `rustc_abi::ExternAbi` Lift `enum Abi` from its rather odd place in the middle of rustc_target, and make it available again from rustc_abi. You know, the crate where you would expect the enum that describes all the ABIs to be? The platform-neutral ones, at least. This will help further refactoring of how we handle ABIs in the near future[^0]. Rename `Abi` to `ExternAbi` because quite a lot of the compiler overloads the concept of "ABI" enough that the existing name is imprecise and it is often renamed _anyway_. Often this was to avoid conflicts with the *other* type formerly known as `Abi` (now named BackendRepr[^1]), but sometimes it is just for clarity, and this name seems more self-explanatory. It does get reexported, though, using its old name, to reduce the odds of merge-conflicting over the entire tree. All of `ExternAbi`'s friends come along for the ride, which costs adding some optional dependencies to the rustc_abi crate. However, all of this also allows simply moving three crates entirely off rustc_target: - rustc_hir_pretty - rustc_lint_defs - rustc_mir_build This odd selection is mostly to demonstrate a secondary motivation: The majority of the front-end of the compiler should be as target-agnostic as possible, and it is easier to assure this if they simply don't depend on the crate that describes targets. Note that I didn't migrate crates that don't benefit from it in this way yet, and I didn't survey every last crate. [^0]: This is being undertaken as part of rust-lang#119183 [^1]: rust-lang#132246

RalfJung · 2024-11-19T09:43:27Z

Looks like there is also some interest on the LLVM side to improve their ABI handling. Would be nice if we could benefit from that -- though we also have other backends, so maybe there's no way we can avoid having our own implementation of the C ABI... OTOH, our current PassMode is very LLVM-specific, so I don't know how much other backends can even use it today.

bjorn3 · 2024-11-19T09:59:28Z

our current PassMode is very LLVM-specific, so I don't know how much other backends can even use it today.

cg_clif makes full use of it. In fact PassMode is at pretty much exactly the right level of abstraction as Cranelift needs (it doesn't support high level types and requires you to decompose everything into primitive values and for struct arguments a pointer argument with ArgumentPurpose::StructArg and for complex return types it requires you to pass a return area pointer. all of which PassMode makes pretty easy to do). The only problems I have with it are that LLVM silently accepts things that IMO shouldn't be accepted like PassMode::Direct for a return value that doesn't fit in registers. (PassMode::Indirect { on_stack: false } should be used for that instead.) We also now have at least some sanity checks for non-sensical things like PassMode::Direct for a complex struct.

workingjubilee · 2024-11-22T08:25:42Z

Yeah, IMO our main ask from LLVM should be "more hard errors instead of silently making up some weird shit, please", with probably more IR parameter attributes to enable "please make up some weird shit on purpose".

RalfJung · 2024-11-22T08:59:04Z

Unfortunately LLVM is moving in the opposite direction, see e.g. the discussion in <llvm/llvm-project#111334>

workingjubilee · 2025-02-11T03:21:27Z

When these two land I will have more or less passed the "mere cleanup" phase:

There will always be more improvements (I have another set of diffs already, actually...) but now I can actually turn to fixing the real problems.

Some scattered thoughts:

There are higher-level and lower-level ways to represent the ABI. Devolving to register and stack passing, for instance, versus high-level abstractions like "pass this like it would be passed via the C ABI".

The problem with using the C ABI as referent is the C ABI often has arbitrary limitations: For instance, multi-register returns are functionally inconceivable in many C ABIs, but are a normal idea for Rust. And because some C ABIs do implement them, LLVM often does not have an actual problem with doing them, they're just a bit weird to express using its C-like syntax.

Going down to individual registers for all arguments might still be too low-level? Yet I definitely do not believe we should be reifying aggregates more in our ABI handling: thinking primarily in terms of them is almost inherently problematic. And we do think about registers a lot in our current handling.

If we did think about this in the more lower-level form, we would need to account for how the translation of a set of arguments to an ABI handling would be inherently ordered: at some point you run out of registers and start putting things on the stack. I don't think that's completely unacceptable, but it does point to rethinking what we're doing fairly extensively.

We already have at least should have a vague idea of what belongs in these two sets:

what may go in registers
what must go on the stack

That's kind of what all the linting about target features points to, anyways.

workingjubilee · 2025-02-11T03:42:30Z

Some constraints we know about:

For any extern "{CC}" fn(A) -> R, then CC, A and R must completely determine the ABI for the call, or else function pointers stop working. This might seem obvious, but I felt I should state it explicitly because it is sometimes tempting to say the ABI should vary based on something that isn't distinguishable as part of the function pointer type, like target features.
For any closure impl FnOnce(A) -> R, we can see these also get erased to dyn FnOnce(A) -> R, which seemingly points in the same direction as with function pointers.
We cannot elevate our model significantly higher on abstraction levels as otherwise it makes lower-level codegen backends like Cranelift never actually fully work with our lowering to their IR.

Lokathor · 2025-02-11T03:46:52Z

Doesn't that already sometimes happen, like soft float vs hard float?

workingjubilee · 2025-02-11T03:49:15Z

Nah, that doesn't happen because people compile code for targets and targets are not allowed to vary whether they are hard float or soft float. 😌

Mark-Simulacrum · 2025-02-11T04:03:42Z

must completely determine the ABI for the call, or else function pointers stop working

I think this is only somewhat true - there must be a "function pointer" ABI, but that ABI may want to differ from the ABI we choose to use for regular (non function pointer) calls, by (for example) inserting a shim when casting a function to a pointer. Effectively, we could model this as every function having a generic of whether it ever gets erased or is always called with knowledge of the specific function.

That might be useful for example to leverage PGO or other information to optimize Result into passing the more common variant for a particular function through registers vs. not.

That's commonly achieved through inlining, but I think it would be nice to avoid excluding it from being done without inlining too.

workingjubilee · 2025-02-11T05:53:49Z

I am aware of that but it seems functionally identical to generating a new extern "{CC}" fn(A) -> R, with CC, A, and R based on but non-identical to the original, and rewriting the call site to call this new function?

A related concern that has been pointed out to me in the past is that sometimes, even with inlining, the codegen backend can have problems erasing the prologue and epilogue. Ideally, we would find a way to make "handling the ABI of this call" semantically separable from the actual motion of "making this call", so that we can delay addressing the need for a prologue and epilogue as long as possible. Notionally the function call ABI, after all, should be a non-event, as otherwise these functions aren't very functional.

RalfJung · 2025-02-11T07:30:07Z

We currently talk a lot about registers in our ABI handling, but that's kind of a lie. All it means is "represent this to LLVM as a scalar / array of scalars" and then LLVM decides whether to put that into registers or on the stack depending on the target's conventions. (Well, there's a flag for forcing things to be in a register -- mixed up with other flags that are irrelevant for the ABI. I don't know the exact semantics of that, it's probably target-specific.)

I don't think we should go any more low-level than this on the Rust side. I'd rather not have us be in the business of counting how many argument go in registers vs on the stack. That's a lot of work, very hard to test, and AFAIK none of our codegen backends actually give us that level of control anyway.

The main goal of the original MCP was to make writing our target-specific foreign ABI adjustments easier and less fragile. For that code, the C ABI is exactly the right abstraction, as that's the level the ABI is specified at: we need to tell that code what the given type "looks like as a C type", and then the code needs to compute a corresponding PassMode. So I think that kind of machinery is definitely on the requirements list here. The MCP talks mostly about how to represent the input to this step; you seem to be thinking mostly about the output? To be honest I have not thought about this much; what we have here currently seems to work well enough for both LLVM and cranelift, so it's not entirely terrible.

A related concern that has been pointed out to me in the past is that sometimes, even with inlining, the codegen backend can have problems erasing the prologue and epilogue. Ideally, we would find a way to make "handling the ABI of this call" semantically separable from the actual motion of "making this call", so that we can delay addressing the need for a prologue and epilogue as long as possible. Notionally the function call ABI, after all, should be a non-event, as otherwise these functions aren't very functional.

That kind of work needs to start in the backends though; LLVM / cranelift would require a higher-level way to express the ABI so that they don't require a bunch of instructions that must later be optimized away again (e.g. when inlining). This also points towards a rather higher-level than lower-level abstraction, i.e., not "register vs stack".

workingjubilee · 2025-02-11T07:42:13Z

LLVM does not implement the C ABI for us.

RalfJung · 2025-02-11T07:47:46Z

Correct. We need to carry a function for each supported architecture (sometimes more than one per arch) that are given "the C view of the type" and compute the PassMode. For this, obviously we should have a way to represent "the C view of the type" so we can use it as input for that function -- right?

Regarding the point about multi-register returns, I don't see why we couldn't extend PassMode to support that. But as noted above that is an entirely orthogonal question to how we represent the input to the ABI adjustment functions, which is what the MCP is primarily about.

RalfJung · 2025-02-11T08:07:03Z

I think you are talking about a completely different layer than what I am talking about in the MCP and the issue description. We need two "ABI representations":

One representation sits between the notion of a Rust type, and the target-specific ABI logic. This is a function of type fn(TyAndLayout) -> AbiRepr, and crucially it must have the property that ABI-compatible Rust types are mapped to the same AbiRepr. This function can and should be target-independent. AbiRepr is a new concept that does not exist in rustc yet but I think having it would greatly improve the correctness of our foreign ABI adjustments.
Since codegen backends do not implement the entire C ABI for us (they implement part of it, like knowing how many arguments to put into registers vs the stack, but many other parts have to be handled by the frontend), we also need to have anther layer below this, which is target-and-ABI-specific. This layer consists of a function fn(AbiRepr) -> PassMode. I am sure PassMode can be improved, and in fact PassMode::Cast will have to be extended as currently it cannot represent everything LLVM can represent and that leads to ABI bugs on some targets. But I don't see which problem would be solved by making PassMode so low-level that we have to know how many arguments to put in registers; I think (and @bjorn3 has confirmed in the past) that it sits at a pretty good level of abstraction.

Currently we have effectively have a function fn(TyAndLayout) -> PassMode for each foreign ABI (except we express this more like fn(TyAndLayout, &mut PassMode), which is even worse). That's bad as it means ABI adjustments can easily accidentally map two ABI-compatible types to different PassMode; I fixed a bunch of those bugs last year. This proposal is about factoring that into two functions with an intermediate abstraction, AbiRepr. I think the vocabulary of C types is not a bad template for this. In particular, using C types here does in no way preclude using multiple return registers; that can still be realized by implementing the "Rust" ABI adjustment to e.g. map a 2-element AbiRepr::Struct to a PassMode that represents a multi-register return.

I don't know where you went on a different track than my line of thoughts here but clearly we're not talking about the same thing, maybe this helps. :)

workingjubilee · 2025-02-11T08:11:10Z

It simply does not make any sense to me to talk about things in terms of C types if they do not have to follow the C ABI, so I think about these elements in a more decomposed way?

workingjubilee · 2025-02-11T08:16:34Z

I think I will simply talk less about these things because it is not very useful for me to try to record my thoughts in forms that are likely to be misunderstood. I understand what you are saying about your primary hope was to make the input forms more sensical, and I do intend to address that, I just am thinking about this from the origin of the demands... "we want programs that do FFI to successfully execute... which means certain things have to go into certain registers... which means that they have to..." etc.

RalfJung · 2025-02-11T09:16:21Z

It simply does not make any sense to me to talk about things in terms of C types if they do not have to follow the C ABI, so I think about these elements in a more decomposed way?

We need some sort of high-level language of types (well, higher-level than PassMode, but lower-level than Rust types) that can serve as input for the target-specific ABI adjustments. I don't care what you call this. I called in AbiRepr above. We don't have to call it "C types" if you don't like that, we can just call it "structs and unions and leaf types" if you prefer 🤷 .

I just am thinking about this from the origin of the demands... "we want programs that do FFI to successfully execute... which means certain things have to go into certain registers... which means that they have to..." etc.

That would make sense if we were emitting the assembly ourselves, but we are not. We are targeting the language of our backends in terms of how they represent ABI. PassMode is designed around that. It works well enough; the ABI issues we have seen have (almost all) not been caused by PassMode being inadequate. They have been caused by the fact that the ABI adjustments take as input a Rust type and that makes it extremely easy to map different-but-defined-to-be-ABI-compatible Rust types to different PassMode. We need a normalization pass that runs before invoking the target-specific logic, so that we only have to worry once about mapping different-but-defined-to-be-ABI-compatible Rust types to identical outputs, rather than having to do that for every single target. That is what this issue attempts to fix. I'm not sure what you are trying to fix but maybe it is something else. :)

RalfJung · 2025-02-11T09:29:37Z

Maybe I once again fell in the trap of "call ABI" meaning like a dozen different things to different people. For the purposes of this discussion, by "call ABI" I mean "the target-independent information that is necessary and sufficient to compute how arguments and returns values are passed between caller and callee". The perfect end state (that we may or may not ever reach) for this would be to say that two types are ABI compatible if and only if the computed call ABI information for them is the same -- that would be very nice for the spec and for MiniRust, anyway.

The process of mapping that information to the concrete things you talk about (registers vs stack etc) is obviously highly target- and ABI-specific. But the input to that process seems to me to be expressible in a nice reasonably high-level target-specific way.

bjorn3 · 2025-02-11T09:56:33Z

@RalfJung

Regarding the point about multi-register returns, I don't see why we couldn't extend PassMode to support that.

In fact we can already represent 4 register returns on 64bit and 8 register returns on 32bit through PassMode::Pair(i128, i128) And I think even more using PassMode::Cast. LLVM will make up an ABI when you do however and in many cases this made up ABI returns the return value using a return area pointer rather than in registers.

@workingjubilee

It simply does not make any sense to me to talk about things in terms of C types if they do not have to follow the C ABI, so I think about these elements in a more decomposed way?

For lowering the C ABI we very much have to first map Rust types into C types as the C ABI is specified as a mapping from C types to registers and stack locations. Some of the ABI issues we have are precisely because the ABI implementation for each architecture does an implicit ad-hoc mapping from Rust to C types rather than sharing this such that we only have to get it right once. The Rust ABI follows a different code path and as such doesn't necessarily have to use this lowering to C types. It could keep directly operating on Rust types if we want.

workingjubilee · 2025-02-11T10:42:18Z

And we also have targets with two C-like ABIs.

RalfJung · 2025-02-11T11:33:55Z

The Rust ABI follows a different code path and as such doesn't necessarily have to use this lowering to C types. It could keep directly operating on Rust types if we want.

Then we still have to get the ABI compat right twice, not the best plan IMO.

RalfJung · 2025-02-11T16:18:59Z

And we also have targets with two C-like ABIs.

That's just two different functions from AbiRepr to PassMode.

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Dec 21, 2023

RalfJung mentioned this issue Dec 21, 2023

Add infrastructure to "compute the ABI of a Rust type, described as a C type" rust-lang/compiler-team#672

Closed

3 tasks

This comment has been minimized.

Sign in to view

RalfJung added the C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC label Dec 21, 2023

hanna-kruppe mentioned this issue Apr 5, 2024

Clang vs wasm32-{emscripten,wasi} rustc C ABI mismatch w.r.t. "singleton" unions #121408

Open

RalfJung mentioned this issue May 22, 2024

interpret: make overflowing binops just normal binops #125359

Merged

RalfJung mentioned this issue Aug 24, 2024

Wasm ABI special cases scalar pairs (against tool conventions) and is not documented #129486

Open

fmease added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 16, 2024

workingjubilee self-assigned this Oct 17, 2024

workingjubilee mentioned this issue Oct 17, 2024

compiler: Error on layout of enums with invalid reprs #131843

Merged

workingjubilee mentioned this issue Oct 28, 2024

Rename rustc_abi::Abi to BackendRepr #132246

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking issue for refactoring the way we represent function call ABIs #119183

Tracking issue for refactoring the way we represent function call ABIs #119183

RalfJung commented Dec 21, 2023 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

RalfJung commented May 29, 2024

workingjubilee commented Oct 16, 2024 •

edited

Loading

RalfJung commented Oct 16, 2024

RalfJung commented Nov 19, 2024

bjorn3 commented Nov 19, 2024

workingjubilee commented Nov 22, 2024

RalfJung commented Nov 22, 2024 via email

workingjubilee commented Feb 11, 2025 •

edited

Loading

workingjubilee commented Feb 11, 2025 •

edited

Loading

Lokathor commented Feb 11, 2025

workingjubilee commented Feb 11, 2025

Mark-Simulacrum commented Feb 11, 2025

workingjubilee commented Feb 11, 2025

RalfJung commented Feb 11, 2025

workingjubilee commented Feb 11, 2025

RalfJung commented Feb 11, 2025 •

edited

Loading

RalfJung commented Feb 11, 2025 •

edited

Loading

workingjubilee commented Feb 11, 2025

workingjubilee commented Feb 11, 2025 •

edited

Loading

RalfJung commented Feb 11, 2025 •

edited

Loading

RalfJung commented Feb 11, 2025 •

edited

Loading

bjorn3 commented Feb 11, 2025

workingjubilee commented Feb 11, 2025

RalfJung commented Feb 11, 2025 via email

RalfJung commented Feb 11, 2025 via email

Tracking issue for refactoring the way we represent function call ABIs #119183

Tracking issue for refactoring the way we represent function call ABIs #119183

Comments

RalfJung commented Dec 21, 2023 • edited Loading

Implementation history

This comment has been minimized.

This comment has been minimized.

RalfJung commented May 29, 2024

workingjubilee commented Oct 16, 2024 • edited Loading

RalfJung commented Oct 16, 2024

RalfJung commented Nov 19, 2024

bjorn3 commented Nov 19, 2024

workingjubilee commented Nov 22, 2024

RalfJung commented Nov 22, 2024 via email

workingjubilee commented Feb 11, 2025 • edited Loading

workingjubilee commented Feb 11, 2025 • edited Loading

Lokathor commented Feb 11, 2025

workingjubilee commented Feb 11, 2025

Mark-Simulacrum commented Feb 11, 2025

workingjubilee commented Feb 11, 2025

RalfJung commented Feb 11, 2025

workingjubilee commented Feb 11, 2025

RalfJung commented Feb 11, 2025 • edited Loading

RalfJung commented Feb 11, 2025 • edited Loading

workingjubilee commented Feb 11, 2025

workingjubilee commented Feb 11, 2025 • edited Loading

RalfJung commented Feb 11, 2025 • edited Loading

RalfJung commented Feb 11, 2025 • edited Loading

bjorn3 commented Feb 11, 2025

workingjubilee commented Feb 11, 2025

RalfJung commented Feb 11, 2025 via email

RalfJung commented Feb 11, 2025 via email

RalfJung commented Dec 21, 2023 •

edited

Loading

workingjubilee commented Oct 16, 2024 •

edited

Loading

workingjubilee commented Feb 11, 2025 •

edited

Loading

workingjubilee commented Feb 11, 2025 •

edited

Loading

RalfJung commented Feb 11, 2025 •

edited

Loading

RalfJung commented Feb 11, 2025 •

edited

Loading

workingjubilee commented Feb 11, 2025 •

edited

Loading

RalfJung commented Feb 11, 2025 •

edited

Loading

RalfJung commented Feb 11, 2025 •

edited

Loading