Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for refactoring the way we represent function call ABIs #119183

Open
1 of 19 tasks
RalfJung opened this issue Dec 21, 2023 · 27 comments
Open
1 of 19 tasks

Tracking issue for refactoring the way we represent function call ABIs #119183

RalfJung opened this issue Dec 21, 2023 · 27 comments
Assignees
Labels
A-ABI Area: Concerning the application binary interface (ABI) C-cleanup Category: PRs that clean code up or issues documenting cleanup. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

RalfJung commented Dec 21, 2023

This is the issue tracking implementation of rust-lang/compiler-team#672. Note that we do not have a final design yet; the best way to represent call ABI, and to disentangle it from the "storage kind" of a type (which is what the Abi type currently largely represents) is yet to be determined.

Note that for the purposes of this, by "call ABI" I mean "the target-independent information that is necessary and sufficient to compute how arguments and returns values are passed between caller and callee". The perfect end state (that we may or may not ever reach) for this would be to say that two types are ABI compatible if and only if the computed call ABI information for them is the same -- that would be very nice for the spec and for MiniRust, anyway.

I do not mean "the target-specific information saying which arguments are passed in which register / on the stack, which are copied and which are passed indirectly". This already exists to some extend as a concept in rustc, called PassMode. It may need some reforming, but that would be a separate discussion.

  • The core need is that a Rust type must have its ABI computed in a manner that is not reimplemented for every target and every codegen backend. e.g. the traversal over types should skip repr(transparent) in essentially every case. Then every target architecture must be ported to it:
    • aarch64
    • arm
    • csky
    • loongarch
    • loongarch64
    • mips
    • mips64
    • powerpc
    • powerpc64
    • riscv32
    • riscv64
    • sparc
    • sparc64
    • x86
    • x86_64
  • layout.abi is not about actual ABI, but only about IR codegen: handling the types as SSA values versus handling them as memory locations.

See in particular this comment.

Implementation history

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Dec 21, 2023
@bjorn3 bjorn3 added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC A-ABI Area: Concerning the application binary interface (ABI) C-cleanup Category: PRs that clean code up or issues documenting cleanup. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Dec 21, 2023
@bjorn3

This comment has been minimized.

@RalfJung

This comment has been minimized.

@RalfJung
Copy link
Member Author

Looks like fixing #117480 may be blocked on this.

@fmease fmease added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 16, 2024
@workingjubilee
Copy link
Member

workingjubilee commented Oct 16, 2024

I've started work on cleaning up ABI code. The first few PRs are just going to be moving stuff around so that we can actually co-locate as much of the ABI code as possible:

I'll try to implement the MCP proper as I go along.

@RalfJung
Copy link
Member Author

Awesome. :-)

If you want feedback on some sketches of what the ABI might look like, feel free to post them here.

@workingjubilee workingjubilee self-assigned this Oct 17, 2024
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Oct 31, 2024
…bi, r=jieyouxu,compiler-errors

compiler: Move `rustc_target::spec::abi::Abi` to `rustc_abi::ExternAbi`

Lift `enum Abi` from its rather odd place in the middle of rustc_target, and make it available again from rustc_abi. You know, the crate where you would expect the enum that describes all the ABIs to be? The platform-neutral ones, at least. This will help further refactoring of how we handle ABIs in the near future[^0].

Rename `Abi` to `ExternAbi` because quite a lot of the compiler overloads the concept of "ABI" enough that the existing name is imprecise and it is often renamed _anyway_. Often this was to avoid conflicts with the *other* type formerly known as `Abi` (now named BackendRepr[^1]), but sometimes it is just for clarity, and this name seems more self-explanatory. It does get reexported, though, using its old name, to reduce the odds of merge-conflicting over the entire tree.

All of `ExternAbi`'s friends come along for the ride, which costs adding some optional dependencies to the rustc_abi crate. However, all of this also allows simply moving three crates entirely off rustc_target:
- rustc_hir_pretty
- rustc_lint_defs
- rustc_mir_build

This odd selection is mostly to demonstrate a secondary motivation: The majority of the front-end of the compiler should be as target-agnostic as possible, and it is easier to assure this if they simply don't depend on the crate that describes targets. Note that I didn't migrate crates that don't benefit from it in this way yet, and I didn't survey every last crate.

[^0]: This is being undertaken as part of rust-lang#119183
[^1]: rust-lang#132246
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Nov 1, 2024
…bi, r=jieyouxu,compiler-errors

compiler: Move `rustc_target::spec::abi::Abi` to `rustc_abi::ExternAbi`

Lift `enum Abi` from its rather odd place in the middle of rustc_target, and make it available again from rustc_abi. You know, the crate where you would expect the enum that describes all the ABIs to be? The platform-neutral ones, at least. This will help further refactoring of how we handle ABIs in the near future[^0].

Rename `Abi` to `ExternAbi` because quite a lot of the compiler overloads the concept of "ABI" enough that the existing name is imprecise and it is often renamed _anyway_. Often this was to avoid conflicts with the *other* type formerly known as `Abi` (now named BackendRepr[^1]), but sometimes it is just for clarity, and this name seems more self-explanatory. It does get reexported, though, using its old name, to reduce the odds of merge-conflicting over the entire tree.

All of `ExternAbi`'s friends come along for the ride, which costs adding some optional dependencies to the rustc_abi crate. However, all of this also allows simply moving three crates entirely off rustc_target:
- rustc_hir_pretty
- rustc_lint_defs
- rustc_mir_build

This odd selection is mostly to demonstrate a secondary motivation: The majority of the front-end of the compiler should be as target-agnostic as possible, and it is easier to assure this if they simply don't depend on the crate that describes targets. Note that I didn't migrate crates that don't benefit from it in this way yet, and I didn't survey every last crate.

[^0]: This is being undertaken as part of rust-lang#119183
[^1]: rust-lang#132246
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Nov 1, 2024
Rollup merge of rust-lang#132385 - workingjubilee:move-abi-to-rustc-abi, r=jieyouxu,compiler-errors

compiler: Move `rustc_target::spec::abi::Abi` to `rustc_abi::ExternAbi`

Lift `enum Abi` from its rather odd place in the middle of rustc_target, and make it available again from rustc_abi. You know, the crate where you would expect the enum that describes all the ABIs to be? The platform-neutral ones, at least. This will help further refactoring of how we handle ABIs in the near future[^0].

Rename `Abi` to `ExternAbi` because quite a lot of the compiler overloads the concept of "ABI" enough that the existing name is imprecise and it is often renamed _anyway_. Often this was to avoid conflicts with the *other* type formerly known as `Abi` (now named BackendRepr[^1]), but sometimes it is just for clarity, and this name seems more self-explanatory. It does get reexported, though, using its old name, to reduce the odds of merge-conflicting over the entire tree.

All of `ExternAbi`'s friends come along for the ride, which costs adding some optional dependencies to the rustc_abi crate. However, all of this also allows simply moving three crates entirely off rustc_target:
- rustc_hir_pretty
- rustc_lint_defs
- rustc_mir_build

This odd selection is mostly to demonstrate a secondary motivation: The majority of the front-end of the compiler should be as target-agnostic as possible, and it is easier to assure this if they simply don't depend on the crate that describes targets. Note that I didn't migrate crates that don't benefit from it in this way yet, and I didn't survey every last crate.

[^0]: This is being undertaken as part of rust-lang#119183
[^1]: rust-lang#132246
@RalfJung
Copy link
Member Author

Looks like there is also some interest on the LLVM side to improve their ABI handling. Would be nice if we could benefit from that -- though we also have other backends, so maybe there's no way we can avoid having our own implementation of the C ABI... OTOH, our current PassMode is very LLVM-specific, so I don't know how much other backends can even use it today.

@bjorn3
Copy link
Member

bjorn3 commented Nov 19, 2024

our current PassMode is very LLVM-specific, so I don't know how much other backends can even use it today.

cg_clif makes full use of it. In fact PassMode is at pretty much exactly the right level of abstraction as Cranelift needs (it doesn't support high level types and requires you to decompose everything into primitive values and for struct arguments a pointer argument with ArgumentPurpose::StructArg and for complex return types it requires you to pass a return area pointer. all of which PassMode makes pretty easy to do). The only problems I have with it are that LLVM silently accepts things that IMO shouldn't be accepted like PassMode::Direct for a return value that doesn't fit in registers. (PassMode::Indirect { on_stack: false } should be used for that instead.) We also now have at least some sanity checks for non-sensical things like PassMode::Direct for a complex struct.

@workingjubilee
Copy link
Member

Yeah, IMO our main ask from LLVM should be "more hard errors instead of silently making up some weird shit, please", with probably more IR parameter attributes to enable "please make up some weird shit on purpose".

@RalfJung
Copy link
Member Author

RalfJung commented Nov 22, 2024 via email

@workingjubilee
Copy link
Member

workingjubilee commented Feb 11, 2025

When these two land I will have more or less passed the "mere cleanup" phase:

There will always be more improvements (I have another set of diffs already, actually...) but now I can actually turn to fixing the real problems.

Some scattered thoughts:

There are higher-level and lower-level ways to represent the ABI. Devolving to register and stack passing, for instance, versus high-level abstractions like "pass this like it would be passed via the C ABI".

The problem with using the C ABI as referent is the C ABI often has arbitrary limitations: For instance, multi-register returns are functionally inconceivable in many C ABIs, but are a normal idea for Rust. And because some C ABIs do implement them, LLVM often does not have an actual problem with doing them, they're just a bit weird to express using its C-like syntax.

Going down to individual registers for all arguments might still be too low-level? Yet I definitely do not believe we should be reifying aggregates more in our ABI handling: thinking primarily in terms of them is almost inherently problematic. And we do think about registers a lot in our current handling.

If we did think about this in the more lower-level form, we would need to account for how the translation of a set of arguments to an ABI handling would be inherently ordered: at some point you run out of registers and start putting things on the stack. I don't think that's completely unacceptable, but it does point to rethinking what we're doing fairly extensively.

We already have at least should have a vague idea of what belongs in these two sets:

  • what may go in registers
  • what must go on the stack

That's kind of what all the linting about target features points to, anyways.

@workingjubilee
Copy link
Member

workingjubilee commented Feb 11, 2025

Some constraints we know about:

  • For any extern "{CC}" fn(A) -> R, then CC, A and R must completely determine the ABI for the call, or else function pointers stop working. This might seem obvious, but I felt I should state it explicitly because it is sometimes tempting to say the ABI should vary based on something that isn't distinguishable as part of the function pointer type, like target features.
  • For any closure impl FnOnce(A) -> R, we can see these also get erased to dyn FnOnce(A) -> R, which seemingly points in the same direction as with function pointers.
  • We cannot elevate our model significantly higher on abstraction levels as otherwise it makes lower-level codegen backends like Cranelift never actually fully work with our lowering to their IR.

@Lokathor
Copy link
Contributor

Doesn't that already sometimes happen, like soft float vs hard float?

@workingjubilee
Copy link
Member

Nah, that doesn't happen because people compile code for targets and targets are not allowed to vary whether they are hard float or soft float. 😌

@Mark-Simulacrum
Copy link
Member

must completely determine the ABI for the call, or else function pointers stop working

I think this is only somewhat true - there must be a "function pointer" ABI, but that ABI may want to differ from the ABI we choose to use for regular (non function pointer) calls, by (for example) inserting a shim when casting a function to a pointer. Effectively, we could model this as every function having a generic of whether it ever gets erased or is always called with knowledge of the specific function.

That might be useful for example to leverage PGO or other information to optimize Result into passing the more common variant for a particular function through registers vs. not.

That's commonly achieved through inlining, but I think it would be nice to avoid excluding it from being done without inlining too.

@workingjubilee
Copy link
Member

I am aware of that but it seems functionally identical to generating a new extern "{CC}" fn(A) -> R, with CC, A, and R based on but non-identical to the original, and rewriting the call site to call this new function?

A related concern that has been pointed out to me in the past is that sometimes, even with inlining, the codegen backend can have problems erasing the prologue and epilogue. Ideally, we would find a way to make "handling the ABI of this call" semantically separable from the actual motion of "making this call", so that we can delay addressing the need for a prologue and epilogue as long as possible. Notionally the function call ABI, after all, should be a non-event, as otherwise these functions aren't very functional.

@RalfJung
Copy link
Member Author

We currently talk a lot about registers in our ABI handling, but that's kind of a lie. All it means is "represent this to LLVM as a scalar / array of scalars" and then LLVM decides whether to put that into registers or on the stack depending on the target's conventions. (Well, there's a flag for forcing things to be in a register -- mixed up with other flags that are irrelevant for the ABI. I don't know the exact semantics of that, it's probably target-specific.)

I don't think we should go any more low-level than this on the Rust side. I'd rather not have us be in the business of counting how many argument go in registers vs on the stack. That's a lot of work, very hard to test, and AFAIK none of our codegen backends actually give us that level of control anyway.

The main goal of the original MCP was to make writing our target-specific foreign ABI adjustments easier and less fragile. For that code, the C ABI is exactly the right abstraction, as that's the level the ABI is specified at: we need to tell that code what the given type "looks like as a C type", and then the code needs to compute a corresponding PassMode. So I think that kind of machinery is definitely on the requirements list here. The MCP talks mostly about how to represent the input to this step; you seem to be thinking mostly about the output? To be honest I have not thought about this much; what we have here currently seems to work well enough for both LLVM and cranelift, so it's not entirely terrible.

A related concern that has been pointed out to me in the past is that sometimes, even with inlining, the codegen backend can have problems erasing the prologue and epilogue. Ideally, we would find a way to make "handling the ABI of this call" semantically separable from the actual motion of "making this call", so that we can delay addressing the need for a prologue and epilogue as long as possible. Notionally the function call ABI, after all, should be a non-event, as otherwise these functions aren't very functional.

That kind of work needs to start in the backends though; LLVM / cranelift would require a higher-level way to express the ABI so that they don't require a bunch of instructions that must later be optimized away again (e.g. when inlining). This also points towards a rather higher-level than lower-level abstraction, i.e., not "register vs stack".

@workingjubilee
Copy link
Member

LLVM does not implement the C ABI for us.

@RalfJung
Copy link
Member Author

RalfJung commented Feb 11, 2025

Correct. We need to carry a function for each supported architecture (sometimes more than one per arch) that are given "the C view of the type" and compute the PassMode. For this, obviously we should have a way to represent "the C view of the type" so we can use it as input for that function -- right?

Regarding the point about multi-register returns, I don't see why we couldn't extend PassMode to support that. But as noted above that is an entirely orthogonal question to how we represent the input to the ABI adjustment functions, which is what the MCP is primarily about.

@RalfJung
Copy link
Member Author

RalfJung commented Feb 11, 2025

I think you are talking about a completely different layer than what I am talking about in the MCP and the issue description. We need two "ABI representations":

  • One representation sits between the notion of a Rust type, and the target-specific ABI logic. This is a function of type fn(TyAndLayout) -> AbiRepr, and crucially it must have the property that ABI-compatible Rust types are mapped to the same AbiRepr. This function can and should be target-independent. AbiRepr is a new concept that does not exist in rustc yet but I think having it would greatly improve the correctness of our foreign ABI adjustments.
  • Since codegen backends do not implement the entire C ABI for us (they implement part of it, like knowing how many arguments to put into registers vs the stack, but many other parts have to be handled by the frontend), we also need to have anther layer below this, which is target-and-ABI-specific. This layer consists of a function fn(AbiRepr) -> PassMode. I am sure PassMode can be improved, and in fact PassMode::Cast will have to be extended as currently it cannot represent everything LLVM can represent and that leads to ABI bugs on some targets. But I don't see which problem would be solved by making PassMode so low-level that we have to know how many arguments to put in registers; I think (and @bjorn3 has confirmed in the past) that it sits at a pretty good level of abstraction.

Currently we have effectively have a function fn(TyAndLayout) -> PassMode for each foreign ABI (except we express this more like fn(TyAndLayout, &mut PassMode), which is even worse). That's bad as it means ABI adjustments can easily accidentally map two ABI-compatible types to different PassMode; I fixed a bunch of those bugs last year. This proposal is about factoring that into two functions with an intermediate abstraction, AbiRepr. I think the vocabulary of C types is not a bad template for this. In particular, using C types here does in no way preclude using multiple return registers; that can still be realized by implementing the "Rust" ABI adjustment to e.g. map a 2-element AbiRepr::Struct to a PassMode that represents a multi-register return.

I don't know where you went on a different track than my line of thoughts here but clearly we're not talking about the same thing, maybe this helps. :)

@workingjubilee
Copy link
Member

It simply does not make any sense to me to talk about things in terms of C types if they do not have to follow the C ABI, so I think about these elements in a more decomposed way?

@workingjubilee
Copy link
Member

workingjubilee commented Feb 11, 2025

I think I will simply talk less about these things because it is not very useful for me to try to record my thoughts in forms that are likely to be misunderstood. I understand what you are saying about your primary hope was to make the input forms more sensical, and I do intend to address that, I just am thinking about this from the origin of the demands... "we want programs that do FFI to successfully execute... which means certain things have to go into certain registers... which means that they have to..." etc.

@RalfJung
Copy link
Member Author

RalfJung commented Feb 11, 2025

It simply does not make any sense to me to talk about things in terms of C types if they do not have to follow the C ABI, so I think about these elements in a more decomposed way?

We need some sort of high-level language of types (well, higher-level than PassMode, but lower-level than Rust types) that can serve as input for the target-specific ABI adjustments. I don't care what you call this. I called in AbiRepr above. We don't have to call it "C types" if you don't like that, we can just call it "structs and unions and leaf types" if you prefer 🤷 .

I just am thinking about this from the origin of the demands... "we want programs that do FFI to successfully execute... which means certain things have to go into certain registers... which means that they have to..." etc.

That would make sense if we were emitting the assembly ourselves, but we are not. We are targeting the language of our backends in terms of how they represent ABI. PassMode is designed around that. It works well enough; the ABI issues we have seen have (almost all) not been caused by PassMode being inadequate. They have been caused by the fact that the ABI adjustments take as input a Rust type and that makes it extremely easy to map different-but-defined-to-be-ABI-compatible Rust types to different PassMode. We need a normalization pass that runs before invoking the target-specific logic, so that we only have to worry once about mapping different-but-defined-to-be-ABI-compatible Rust types to identical outputs, rather than having to do that for every single target. That is what this issue attempts to fix. I'm not sure what you are trying to fix but maybe it is something else. :)

@RalfJung
Copy link
Member Author

RalfJung commented Feb 11, 2025

Maybe I once again fell in the trap of "call ABI" meaning like a dozen different things to different people. For the purposes of this discussion, by "call ABI" I mean "the target-independent information that is necessary and sufficient to compute how arguments and returns values are passed between caller and callee". The perfect end state (that we may or may not ever reach) for this would be to say that two types are ABI compatible if and only if the computed call ABI information for them is the same -- that would be very nice for the spec and for MiniRust, anyway.

The process of mapping that information to the concrete things you talk about (registers vs stack etc) is obviously highly target- and ABI-specific. But the input to that process seems to me to be expressible in a nice reasonably high-level target-specific way.

@bjorn3
Copy link
Member

bjorn3 commented Feb 11, 2025

@RalfJung

Regarding the point about multi-register returns, I don't see why we couldn't extend PassMode to support that.

In fact we can already represent 4 register returns on 64bit and 8 register returns on 32bit through PassMode::Pair(i128, i128) And I think even more using PassMode::Cast. LLVM will make up an ABI when you do however and in many cases this made up ABI returns the return value using a return area pointer rather than in registers.

@workingjubilee

It simply does not make any sense to me to talk about things in terms of C types if they do not have to follow the C ABI, so I think about these elements in a more decomposed way?

For lowering the C ABI we very much have to first map Rust types into C types as the C ABI is specified as a mapping from C types to registers and stack locations. Some of the ABI issues we have are precisely because the ABI implementation for each architecture does an implicit ad-hoc mapping from Rust to C types rather than sharing this such that we only have to get it right once. The Rust ABI follows a different code path and as such doesn't necessarily have to use this lowering to C types. It could keep directly operating on Rust types if we want.

@workingjubilee
Copy link
Member

And we also have targets with two C-like ABIs.

@RalfJung
Copy link
Member Author

RalfJung commented Feb 11, 2025 via email

@RalfJung
Copy link
Member Author

RalfJung commented Feb 11, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ABI Area: Concerning the application binary interface (ABI) C-cleanup Category: PRs that clean code up or issues documenting cleanup. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants