Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental feature gate proposal crabi #105586

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

joshtriplett
Copy link
Member

@joshtriplett joshtriplett commented Dec 12, 2022

Summary

This experimental feature gate proposal proposes developing a new ABI,
extern "crabi", and a new in-memory representation, repr(crabi), for
interoperability across high-level programming languages that have safe data
types.

This will use the feature gate crabi, which will be marked as experimental
until a subsequent RFC provides a precise definition of crABI.

This work was previously discussed under the names "safe ABI" and "interop
ABI", but was renamed to "crabi" to avoid misleadingly broad implications of
"safe" or "interop".

Motivation

Today, developers building projects incorporating multiple languages, or
calling a library written in one language from another, often have to use the C
ABI as a lowest-common-denominator for cross-language function calls. As a
result, such cross-language calls use unsafe C representations, even for types
that both languages understand. For instance, passing a string from Rust to
another high-level language will typically use an unsafe C char *, even if
both languages have a safe type for counted UTF-8 strings.

For popular pairs of languages, developers sometimes create higher-level
binding layers for combining those languages. However, the creation of such
binding layers requires one-off effort between every pair of programming
languages. Such binding layers also add work and overhead to the project for
each pair of languages, and may not play well together when using more than one
in the same project.

Furthermore, higher-level data types such as Option and Result currently
require translation into C-ABI-compatible types, which discourages the use of
such types in cross-language interfaces, and encourages the use of more complex
and less safe encodings (e.g. manually encoding Option via an invalid value
of a parameter).

Finally, system libraries and other shared libraries typically use the C ABI
as well. Software making a Linux .so, Windows DLL, or macOS dylib, will
typically expose a C-compatible ABI, and cannot easily provide a higher-level
safe ABI without shipping language-specific high-level bindings.

crABI will define a standard way to make calls across high-level languages,
passing high-level data types, without dropping to the lowest common
denominator of C. crABI will work with any language providing a C-compatible
FFI (including C itself), and languages can also add specific higher-level
native support for crABI.

crABI aims to be a reasonable default for compiled libraries in both static and
dynamic form, including system libraries.

Requirements

The crABI experiment will include a new ABI, extern "crabi", and a new
in-memory representation, repr(crabi).

The crABI support for Rust will be a strict superset of the C ABI support for
Rust. This ensures that, for functionality not yet supported by crABI, users
still have the option of using their own translations to the raw C ABI, while
still using crABI for what it does support.

crABI will be defined via "lowering" to the C ABI: crABI will define how to
pass or return types not supported by C, by defining how to translate them to
types and structures supported by C. This allows any language with C FFI
support to also call functions using crABI, without requiring special language
support. However, languages may still wish to add higher-level support for
crABI, to avoid having to write a translation layer for their own native types.

To the extent crABI supports passing ownership (e.g. strings), it must also
specify how to reclaim the associated memory. (However, future support for
objects or traits may require invoking a destructor instead.)

crABI could define a symbol naming scheme, to allow identifying symbols that
use crABI. However, crABI must be compatible with languages that only support C
FFI and do not have native crABI support, and which must thus reference the
symbol via its name; therefore, crABI should not have a complex or non-obvious
mangling scheme.

crABI should include a versioning scheme, to allow for future compatible
extensibility. crABI version 1 will handle many simple cases of widespread
interest. More complex cases, such as trait objects, or arbitrary objects with
methods, will get deferred to future versions. The versioning scheme will allow
for both compatible and incompatible changes; changes to crABI will strive to
remain compatible with previous versions when not using functionality
unsupported by those previous versions.

Rust will support defining functions using crABI, and calling
crABI functions defined elsewhere. Rust will support compiling both
static and dynamic libraries that export crABI symbols.

Rust should also support passing around function pointers to functions that use
crABI.

Non-requirements

crABI does not aim to support the full richness of Rust's type system, or that
of other languages. It aims to support common cases more safely and simply.

In particular, while crABI will over time support an increasing subset of Rust
features, and specific types from the standard library will become available as
the necessary features to support them do, crABI does not aim to support the
entire Rust standard library.

crABI will not aim to support complex lifetime handling, or to fully solve
problems related to describing pointer lifetimes across different languages.
crABI may provide limited support for some subsets of this, such as "this
pointer is only valid for the duration of this call and must not be retained",
or "this pointer transfers ownership to the callee, and the caller must not
retain it".

crABI (at least in the first version) will not provide an interface description
language (IDL), in either source or compiled form; function symbols using crABI
will not provide function signature information in compiled objects. A future
version of crABI may generate and provide machine-readable interface
descriptions.

crABI does not aim to provide "translations" between the most native
representations of different languages. For instance, though different
languages may store strings in different fashions, crABI string types will have
a specific representation in memory and a specific lowering to C function
parameters/results. Languages whose native string representation does not match
crABI string representation may need to translate, or may need to treat the
crABI string object as a distinct data type and provide distinct mechanisms for
working with it. (By contrast, WebAssembly Interface Types (WIT) aims to
provide such translations in an efficient fashion, by generating translation
code as needed between formats.)

crABI cannot support arbitrary compile-time generic functions; generics will
require the use of opaque objects, trait objects, or similar. A future version
could support exporting specific instantiations of generics. (However, crABI
will support enough of generics to allow types like Option<u64> or
Result<u64, ConcreteError> or [u8; 16] or [u8] to work, such as by
supporting their use with concrete types as long as no generic parameters
remain unbound in the final function signature.)

crABI cannot prevent callers from passing parameters that violate the
specification, and does not claim to. More generally, crABI does not provide
sandboxing or similar functionality that would be required to interoperate with
untrusted code.

The initial version of crABI will likely not attempt to standardize destructors
or memory reclamation, though future versions may. Users of crABI will still
need to provide and use xyz_free functions to delegate object destruction and
reclamation back to the code that provided the object.

Potential functionality

This section includes some potential examples of types crABI could support.
Some of these will appear in the first version of crABI; many will get deferred
to a future version.

  • Tuples, of arbitrary size.
  • The "unit" type ().
  • enums, including enum variants containing fields.
    • More specifically, Option and Result.
  • Counted UTF-8 strings, (with no guarantee of a NUL terminator).
  • A Unicode scalar value (Rust char).
  • Filesystem paths, or other operating-system strings.
  • Arrays, with a compile-time-known size.
  • Counted slices.
  • Ranges
  • Owned pointers to any supported type (e.g. Box), as well as owned pointers
    to types that can't be passed by value.
  • References, with a limited degree of lifetime support.
    • &str
  • Closures, with a limited degree of lifetime support.
  • Futures, with a limited degree of lifetime support. This would in particular
    support extern "crabi" async fn.
  • "noreturn" functions, as expressed in Rust via -> !.
  • Opaque objects with crABI methods, without exposing representation. (This
    would allow passing objects like Vec or HashMap or HashSet, without
    constraining the internals. This would also allow interoperating across
    versions of Rust.)
    • An opaque error container, for use with Result.
  • Trait objects with crABI methods. (This may use the same mechanism as
    objects.)

Open questions

  • Niches: should we support cases like Option<bool> without a separate
    discriminant, or should we (for simplicity) always pass a separate
    discriminant? Likely the latter. However, what about things like Option<&T>
    and Option<NonZeroU32>, for which Rust guarantees the representation of
    None? Those work with the C ABI, and they have to work with crABI, but can
    we make them work with crABI using the same encoding of None?
  • What subset of lifetimes can, and should, we support? We can't enforce them
    cross-language, but they may be useful as an advisory/documentation
    mechanism. Or we could leave them out entirely.
  • To what extent should crABI make any attempt to specify things that can't
    be enforced, rather than ignoring semantics entirely and only specifying
    how types get passed?
  • How can we make it easy to support data structures without having to do
    translation from repr(Rust) to repr(crabi) and have parallel structures?
    Can we make that less painful to express, and ideally mostly free at runtime?
    • Related: how can we handle tuples? Do we need a way to express
      repr(crabi) tuples? How can we do that conveniently?
  • Should we provide support for extensible enums, such that we don't assume the
    discriminant matches one of the known variants? Would doing so make using
    enums less ergonomic? Could we address that with language changes?
  • For handling objects, could we avoid having to pass in-memory function
    pointers via a vtable, and instead reference specific symbols? This wouldn't
    work for generics, though. Can we do any better than a vtable?
  • For ranges, should we provide a concrete range type or types, or should we
    defer that and handle ranges as opaque objects or traits?
  • Do we get any value out of supporting (), other than completeness? Passing
    () by value should just be ignored as if it weren't specified. Do we want
    people using pointers to (), and do those have any advantage over pointers
    to void?
  • Should we do anything special about i128 and u128, or should we just push
    for getting those supported correctly in extern "C"?
  • For generics, such as Option<u64> or Result<u32, ConcreteError> or
    [u8; 16], does the rule "all generic parameters must be bound to concrete
    types in the function signature" suffice, or do we need a more complex rule
    than that?
  • Unwinding: The default extern "crabi" should not support unwind, and most
    languages don't tend to have support for unwinding through C-ABI functions,
    but should we have a crabi-unwind variant? Would doing so provide value?

Prior art

Some potential sources of inspiration:

  • WebAssembly Interface Types
  • The abi_stable crate (which aims for Rust-to-Rust stability, not
    cross-language interoperation, but it still serves as a useful reference)
  • stabby
  • UniFFI
  • Diplomat
  • Swift's stable ABI
  • C++'s various ABIs (and the history of its ABI changes). crABI should not
    strive to be a superset of any C++ ABI, though.
  • Many, many interface description languages (IDLs).
  • The x86-64 psABI. While we're not specifying the lowering all the way to
    specific architectures, we can still learn from how it handles various types.

Rationale and alternatives

Rather than being defined via lowering to the C ABI, crABI could directly
define how to pass parameters on underlying architectures, such as which
registers to use for which parameters and how to pass or return specific types.
This would have the advantage of allowing improvements over the C ABI. However,
this would have multiple substantial disadvantages, such as requiring dedicated
support in every programming language (rather than leveraging C FFI support),
and requiring definition for every target architecture. Instead, this proposal
suggests making such improvements at the C ABI level, such as by defining
extensions for passing or returning specific types in a more efficient fashion.

crABI could exclude portions of the C ABI considered unsafe, such as raw
pointers. This would make crABI not a strict superset of the C ABI. This
would make it difficult to handle functionality that crABI does not yet
support, while simultaneously using crABI for functionality it does support.
For instance, a program may wish to pass both an enum parameter and a raw
pointer parameter. Leaving out this functionality might encourage people to
avoid crABI or to define some functions via crABI and some via C ABI.

"crABI" serves as a neutral name identifying this ABI and its functionality.
(Thanks to @m-ou-se for the name "crABI".)
This work previously went under the name "safe ABI", but given that the ABI
does not exclude portions of the C ABI considered unsafe, a name like "safe"
would be a misnomer. This work also previously went under the names "interop"
and "interoperable ABI"; however, the names interop and "interoperable ABI"
are not particularly identifying, unambiguous, easy to talk about, or other
properties of a good name. In addition, "interop"/"interoperable" can imply a
greater breadth than the initial version of crABI aspires to, such as including
an IDL.

crABI does not officially stand for anything. Insert your favorite backronym.

Future work

  • Debug/trace tools, such as debugger support or ltrace support, to decode
    crABI structures and types.
  • Adding native crABI support to various languages.
  • Shipping C header files defining structures for crABI.

@joshtriplett joshtriplett added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Dec 12, 2022
@rustbot
Copy link
Collaborator

rustbot commented Dec 12, 2022

r? @petrochenkov

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 12, 2022
@joshtriplett joshtriplett added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). I-lang-nominated Nominated for discussion during a lang team meeting. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 12, 2022
@joshtriplett
Copy link
Member Author

joshtriplett commented Dec 12, 2022

I'm working on this with @m-ou-se, @tmandry, and @Amanieu. If approved, I'd also seek to create a Zulip channel for ongoing collaboration on this.

I've previously brought this up with the lang team in nascent form; this is now a full proposal for an experimental feature gate.

Planning to discuss this at the next lang meeting, including who should be the lang liaison for this.

The current PR has None for the tracking issue; will wait to file a tracking issue until after review and approval.

@programmerjake
Copy link
Member

an issue i spotted is passing &(A, Tuple, AtomicU8) is problematic because there's no way to create a non-repr(Rust) tuple. passing tuples by-value works because it can copy the contents into whatever shape the ABI dictates, passing by reference doesn't work because you can't copy the AtomicU8 to some non-repr(Rust) location and have it still work since another thread using the original location would no longer be able to communicate with the copied location.

@joshtriplett
Copy link
Member Author

joshtriplett commented Dec 12, 2022

@programmerjake Good catch; I've added that issue to the proposal, that in order to interoperate we'd need a way to have repr(interop) tuples or similar, ideally with non-onerous syntax.

@Mark-Simulacrum
Copy link
Member

I'm wondering to what extent we expect to block moving out of experimental status on having support (in some fashion) in other languages. Obviously, the proposal indicates that support should be possible via just C FFI (and users doing work to manually align their signatures with the Rust-proposed interop), but it seems plausible that 'more' could also be possible (e.g., ABI or bindings akin to C FFI bindings in those languages).

A secondary question is whether we expect Rust itself to ship C headers which describe the lowering (e.g., I could imagine a struct interop_str { ptr, len }). I think this is similar to "other language's supporting" but sufficiently distinct to poke at.

I raise these mostly so we can have some early discussion on what is expected as part of the experimentation phase; I myself don't think either of these should be blocking concerns to experimentation or stabilization (particularly given versioning!) but if someone else does we should find out early.

One other meta-thought: I think an explicit note around which open questions you expect to answer before moving to non-experimental status might be helpful. Some of them seem rather large and/or like they could be left to after an initial stabilization (and obviously moving out of experimental), but having some sense there would help gauge where it makes sense to have extensive thought around potential concerns (that should be addressed through experimentation) vs. where that isn't yet needed, because the thorough design work is far out.

@rust-log-analyzer

This comment has been minimized.

@joshtriplett
Copy link
Member Author

I'm wondering to what extent we expect to block moving out of experimental status on having support (in some fashion) in other languages.

Not at all. We'll want to demonstrate how it can work with other languages, but we shouldn't block on any form of support that isn't already there.

A secondary question is whether we expect Rust itself to ship C headers which describe the lowering (e.g., I could imagine a struct interop_str { ptr, len }). I think this is similar to "other language's supporting" but sufficiently distinct to poke at.

Interesting idea. Added to future work. But no, I don't think that should be a blocker either.

One other meta-thought: I think an explicit note around which open questions you expect to answer before moving to non-experimental status might be helpful. Some of them seem rather large and/or like they could be left to after an initial stabilization (and obviously moving out of experimental), but having some sense there would help gauge where it makes sense to have extensive thought around potential concerns (that should be addressed through experimentation) vs. where that isn't yet needed, because the thorough design work is far out.

Fair point. I think all of them need to be addressed before stabilization (even if they're addressed by way of "we've thought about this and decided not to"), but not all of them would be blockers for stabilization. I'd expect the RFC (required for making it non-experimental) to consider all of these, but potentially defer some of them explicitly.

@m-ou-se
Copy link
Member

m-ou-se commented Dec 13, 2022

I made a quick sketch of the current state and potential future of "Rust ABI" related stuff:

diagram

This PR and what Josh wrote above covers A (calling convention and data layout) and C (trait objects/vtables and typeid). In parallel (independent from Josh's proposal) we can also start working on B: the mechanism for exporting/importing/naming symbols and working with dynamic libraries. D and E are less clear for now, but also things we can consider working on in the future.

@programmerjake
Copy link
Member

other stuff I think we should do (maybe for interoperable ABI v2.0) is:

  • fix C ABIs to keep more arguments and return values in registers (most C ABIs only let you use at most 2 registers for return values, even though you can use waay more to pass parameters). this can't be expressed in C without extensions.
  • maybe do something like Swift which iirc on x86 uses the ZF flag to tell the caller if the result is an error or normal return value -- Rust could use a flag to tell the caller if the returned value is Ok or Err, avoiding needing to pack the discriminant into the returned struct allowing more efficient register use. this can't be expressed in C without extensions.
  • change ABI inside repr(interop) to have better alignment for types, e.g. f64 should always have align 8. this can be expressed in C/C++ with alignas/_Alignas.

@joshtriplett
Copy link
Member Author

@programmerjake While I'd love to have such optimizations, "strictly lowerable to the C ABI" is a key property that makes it possible to interoperate with all languages, so I don't think we'd want to do either of the first two properties.

The third could be done, as you said, but passing f64 in a manner incompatible with C double would be a large inconvenience, so if we were to consider that we'll have to take into account whether it'd be worth the cost imposed on every language that understands C doubles.

@bjorn3
Copy link
Member

bjorn3 commented Dec 16, 2022

One item on the wish list from me is to have the lowering to the C abi not use struct arguments, but only primitive types like integers (up to the register size, i128 has inconsistent abi across compilers), pointers and floats. Struct arguments are non-trivial to implement compared to primitive type arguments. The way to do this would be similar to the current implementation of the rust abi I think: Split into multiple registers or pass as a pointer rather than a struct argument depending on the size.

@programmerjake
Copy link
Member

The third could be done, as you said, but passing f64 in a manner incompatible with C double would be a large inconvenience, so if we were to consider that we'll have to take into account whether it'd be worth the cost imposed on every language that understands C doubles.

f64 would always be passed identically to C double (except on arches where a double isn't a f64, iirc some embedded arches have it be f32 or some non-ieee type), the only difference is alignas is required in repr(interoperable) structs that contain under-aligned types (of which f64 on some arches is an example) to ensure scalar fields (and arrays) are aligned to the size of their scalar type.
e.g. extern "interoperable" fn f(a: f64) -> f64 has identical ABI to extern "C" fn f(a: f64) -> f64 (assuming my first suggestion of adjusting argument registers isn't implemented).
the only difference is:

#[repr(interoperable)]
struct F64Inside {
    a: u8,
    b: f64, // forced to have size == align == 8
    c: [f64; 3], // forced to have size == 24, align == 8
}

@joshtriplett
Copy link
Member Author

We discussed this in today's @rust-lang/lang meeting, and agreed to merge this and start experimentation.

@ckaran
Copy link

ckaran commented Dec 20, 2022

@joshtriplett I think that what @programmerjake was saying does make sense in the long run (as does the idea of eventually getting away from 'unsafe' ABI behavior).

So, here's a question: can we require all conforming implementations to have just enough introspection that all tools know which version of the Interop ABI is in use? That way we can do the following:

  • Interop 1.x - It's a strict superset of the C ABI, with new facilities being added to it that are safer than the C versions.
  • Interop 2.x - Interop now has all the facilities in place that you no longer need to use the C versions. All conforming tools now start to issue warnings if the unsafe portions of the C ABI are used. Kind of like doing a crater run, this lets us find the corner cases that we missed in creating Interop, and informs the maintainers of every old library, package, etc., etc., etc. that they need to issue updated versions of their code (or possibly replace them entirely).
  • Interop 3.x - The rate of bug reports has fallen to the point that we can turn the warnings into errors, but the tools don't remove the unsafe portions of the C ABI. All conforming tools have some way of setting a switch that lets you downgrade the error to a warning (or even turn it off entirely). That ensures nothing gets completely left behind.

@joshtriplett, I know you have very, very strong views on not getting rid of any part of the C ABI, but could you support a plan like the above? Maybe in a 100 years all code out there will be using only the safe portions of Interop and at that point someone can start truly removing support for the unsafe portions of the C ABI, but that will be for our successors to figure out.

@bstrie
Copy link
Contributor

bstrie commented Dec 27, 2022

I'm excited to see this, but this is a potentially huge topic (both in scope and in ramifications), and I'd expect to see something like a pre-RFC with more details somewhere. Does such a document exist?

@joshtriplett
Copy link
Member Author

@bstrie The original post in this issue is meant to be that document.

@matklad matklad requested review from jackh726 and removed request for jackh726 December 27, 2022 00:55
@SUPERCILEX
Copy link
Contributor

Stable layout guarantees of common standard library types

How much will this optimizations for non-ffi users? It'd be a bummer to regress the majority.

@EdorianDark
Copy link
Contributor

There is a new RFC for crABI v1: rust-lang/rfcs#3470

@gogo2464
Copy link

I planned to do it 2 days ago. idk how I could help swig/swig#2734

@shadow-absorber

This comment was marked as off-topic.

@Rudxain
Copy link
Contributor

Rudxain commented May 29, 2024

It seems nobody has mentioned how crABI will make TCO/TCE easier to implement:

  • Tail calls also "play badly" with assumptions in C tools, including
    platform ABIs and dynamic linking.
  • Tail calls require a calling convention that is a performance hit
    relative to the C convention.

@ShadowJonathan
Copy link

Adds self to list of updates from messages here

@shadow-absorber you can do this without putting a message here by reacting to the top-level comment with any emoji (i often choose 👀), and then setting notifications to "all", then updates will appear under "participating" in github's notifications list.

@kkysen
Copy link

kkysen commented May 29, 2024

Adds self to list of updates from messages here

@shadow-absorber you can do this without putting a message here by reacting to the top-level comment with any emoji (i often choose 👀), and then setting notifications to "all", then updates will appear under "participating" in github's notifications list.

You can also just click the "Subscribe" button under "Notifications" on the right side bar (desktop) or at the bottom of the page (mobile).

@bjorn3
Copy link
Member

bjorn3 commented May 29, 2024

It seems nobody has mentioned how crABI will make TCO/TCE easier to implement:

I don't think it will make it wasier. Having crABI lower to anything but the core of the C ABI (with that I mean the part that handles things like the register assignment of primitive values and the stack layout, but how structs are lowered to primitive values) would be very hard without LLVM support and basically require us writing inline asm for each function, destroying optimizations. And getting LLVM to support an entirely new ABI will take years and still not help with other backends like GCC. The tail call issues are with the core C ABI, not the surface layer that crABI would replace.

@Sk44rt
Copy link

Sk44rt commented Sep 27, 2024

c'mon rust exists for about 9 years and still no stable ABI?
maybe we just need to rewrite the compiler from scratch but using latest version of rustc ? :D

@NobodyXu
Copy link
Contributor

c'mon rust exists for about 9 years and still no stable ABI?

Well no stable ABI is actually an advantage regarding performance.

Stable ABI means the struct/enum layout is fixed, with that compiler can't change it to use a more optimised layout, i.e. using more niche to reduce size of enum, or make struct more compact, or optimise cache behaviour based on the access pattern of the struct/enum.

@bjorn3
Copy link
Member

bjorn3 commented Sep 27, 2024

And C++ has made several mistakes that can't be fixed due to the fact that they have a de-facto stable ABI. For example passing unique_ptr as argument will pass a pointer to the unique_ptr instead. So if you want to get the value inside the unique_ptr, the compiler will need to dereference twice in a row. And the associative containers and regex support in the C++ standard library are a lot slower than they need to be. See also https://cor3ntin.github.io/posts/abi/

@NobodyXu
Copy link
Contributor

@Sk44rt The same is also true for the calling conventions.

Currently rust doesn't have guaranteed copy elision on return or passing as parameters, to support that the ABI would have to be changed in an incompatible way.

I believe the guaranteed copy elision would also have some overlap with returning unsized objects like slice and trait objects, since in the ABI level, the caller has to provide a way for the callee to write the unsized objects into the place it wants.

@NobodyXu
Copy link
Contributor

And there's also the vtable layout.

cc @Sk44rt based on my knowledge, the vtable layout isn't completely stable in C++, it is still a bit compiler-dependent, there are still some subtle differences between clang and gcc based on this SSO answer.

And I know for sure that the msvc and gcc/clang uses different ABI, so when compiling on windows, the compiler does matter when it comes to ABI, as different compiler might gives you different ABIs.

Also, from this SSO:

  • GCC/C++ keeps its ABI stable since 3.4 release and it is about 7 years (since 2004) while MSVC breaks its ABI every major release: MSVC8 (2005), MSVC9 (2008), MSVC10 (2010) are not compatible with each other.
  • Some frequently flags used with MSVC can break ABI as well (like Exceptions model)

So stablising ABI isn't simple at all, it's incredibly hard, even C++ which has been around for much longer has problem with it.

@eli-schwartz
Copy link

Well no stable ABI is actually an advantage regarding performance.

The original description of this PR and the associated RFC lay out good reasons to care about a stable ABI that don't really concern themselves with performance, and also don't force people to use the stable ABI.

But if you're really concerned with performance, I challenge you to write a project which provides a number of mid-sized executables, each of which depends on the same 100mb framework. Decide:

  • do you want to ship 13 x 105mb executables?
  • do you want to ship 13 x 5 mb executables + 1 x 100mb shared crABI library?

tl;dr rust will never have a gtk or a Qt6 without providing a way to export a carefully defined stable ABI. People will just link to Qt6 bindings. :)

...

Frankly I don't get this sub-conversation to begin with. Someone posted an obvious troll comment about solving the problem of being able to optionally choose to export a stable ABI by... rewriting the compiler from scratch using the latest version of rustc? This is not only not a serious proposal to solve ABI (given it doesn't define even 0.0000000001% of the problem domain with actually, you know, specifying what the ABI would be), it seems a bit redundant with... current state of the art when it comes to rustc?

Your response is to argue that static linking can do dead code elimination. No kidding. :)

In a serious conversation about serious issues relating to compiler design, someone posted the technical equivalent of a non-sequitur "hey look at me I'm so random", and someone else responded "actually did you know Shakespeare's father made gloves for a living".

This is not actually what happened, but it's what it feels like happened. Please... some people are following this PR because they find the compiler design interesting and would like to use rust more often.

@bjorn3 yes that goes for you too. :( Your reply has nothing to do with this proposal, which recommends the concept of "versions" that the C++ committee doesn't grok. It's just an offtopic dig at C++ rather than an interesting insight into crABI and what rust might be able to offer.

@Sk44rt
Copy link

Sk44rt commented Sep 27, 2024

But if you're really concerned with performance, I challenge you to write a project which provides a number of mid-sized executables, each of which depends on the same 100mb framework. Decide:

* do you want to ship 13 x 105mb executables?

* do you want to ship 13 x 5 mb executables + 1 x 100mb shared crABI library?

yeah that' what i DON'T like in statically compiled binaries
for example cosmic
file manager (cosmic-files) weighs 32mb (and that's a lot)
and this is the only one binary from cosmic project
small example from iced-rs in release mode+strip weighs 13mb

dynamic linkage would be a nice option for this, rust can do it for now, but compiler throws out standard lib out of the binaries too

@NobodyXu
Copy link
Contributor

@eli-schwartz I agree that having an opt-in stable ABI is incredibly useful and help with dynamic linking, never do I ever oppose that idea.

I was merely arguing that, having the entire rust ABI stablised right now is bad and would make any further development harder now.

I get your point on framework like gtk/Qt6, having an opt-in stable ABI would definitely help that.

@eli-schwartz
Copy link

@eli-schwartz I agree that having an opt-in stable ABI is incredibly useful and help with dynamic linking, never do I ever oppose that idea.

I was merely arguing that, having the entire rust ABI stablised right now is bad and would make any further development harder now.

You are de facto arguing that having an opt-in stable ABI is bad and saying you oppose having an opt-in stable ABI.

Because no one has suggested having the entire rust ABI stabilized, so the only thing you could possibly be arguing against is this PR itself, which proposes an opt-in stable ABI. (Note that your initial response made zero attempt to draw any distinction between opt-in vs the entire ABI. What do you expect readers to think?)

I am only 15% joking. And the 15% of me that is joking still wishes you had said nothing other than "please ignore the troll". The PR already makes the case for it being an optional subset, rather better than you did, so I simply don't see what you were trying to add beyond "debate the troll for fun and profit".

It would be nice if technical PRs were a reliable source of notifications for news about the proposal or implementation discussion. I will not be replying further, in the hope of encouraging that to be the case. :) We have reddit and hackernews for the other kind of discussion.

@NobodyXu
Copy link
Contributor

NobodyXu commented Sep 27, 2024

@eli-schwartz for the framework issue you've mentioned, one alternative, yet effective way is to have multi-call binary for the project.

Busybox is an excellent example of this, utilising multi-call binary approach it's much smaller than coreutils.

IIRC the rust rewrite of coreutils also support multi-call, and it reduces the size very well.

Multi-call binary not only reduces size but also provides best perf as you can perform LTO on the final binary, which is impossible

It's true that this approach has limitations since it would only usable within one project, but it's simple as you don't have to opt-in to stable ABI nor lose any optimisation.

And you'd retain all the advantages of static linking - can run without having to install dependencies.

That is not to say opt-in stable ABI is not useful, it's just that there are indeed other ways to reduce binary size, if that's all you want.

@NobodyXu
Copy link
Contributor

You are de facto arguing that having an opt-in stable ABI is bad and saying you oppose having an opt-in stable ABI.

@eli-schwartz you are now mis-interpretjng what I have written, there is no de facto arguing, I have said that I fully support opt-in stable ABI.

Because no one has suggested having the entire rust ABI stabilized

That was what I thought of when reading @Sk44rt 's comment.

I agreed that my attention is drawed by a troll-comment and probably better for me to ignore it.

@slanterns
Copy link
Contributor

slanterns commented Sep 27, 2024

Yeah don't let trolls attracted by some articles derail the discussion. They don't even realize crABI is something different from "a stable Rust ABI" and spam the comments in the wrong place.

(btw, I also want to see the crABI experiment continue.)

@Stargateur
Copy link
Contributor

Stargateur commented Sep 30, 2024

And C++ has made several mistakes that can't be fixed due to the fact that they have a de-facto stable ABI. For example passing unique_ptr as argument will pass a pointer to the unique_ptr instead. So if you want to get the value inside the unique_ptr, the compiler will need to dereference twice in a row. And the associative containers and regex support in the C++ standard library are a lot slower than they need to be. See also cor3ntin.github.io/posts/abi

C++ is like C and actually like Rust, officially they don't have ABI. In theory any C or C++ compiler is free to do whatever its want. It's more they are constraint to not change it cause the user are expecting its doesn't change. The same will happen to Rust (thus Rust use C to try to avoid this).

@alex-semenyuk alex-semenyuk added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 14, 2024
@kanashimia
Copy link

Should we provide support for extensible enums, such that we don't assume the
discriminant matches one of the known variants?

I would really like to see this feature, but this point seems like it needs its own proposal as it would be useful with other reprs. Something like repr(u32, non_exhaustive) would be very useful for FFI. It also don't make sense to have this in crABI but not in repr C. It was already discussed some number of times before too, but nobody wrote an RFC. Well, maybe crABI could have enums be extensible by default, but that is a different question.

@safinaskar
Copy link
Contributor

I have multiple objections.


crABI does not aim to provide "translations" between the most native
representations of different languages. For instance, though different
languages may store strings in different fashions, crABI string types will have
a specific representation in memory and a specific lowering to C function
parameters/results. Languages whose native string representation does not match
crABI string representation may need to translate, or may need to treat the
crABI string object as a distinct data type and provide distinct mechanisms for
working with it. (By contrast, WebAssembly Interface Types (WIT) aims to
provide such translations in an efficient fashion, by generating translation
code as needed between formats.)

What?! So, with crABI you will have 2 translations in worst case (to crABI and from crABI), and with webasm you will have 1 translation in worst case (assuming language pair itself is well supported), because every well supported language pair will have highly efficient translations between formats. So, crABI will be worse than webasm. Why we need crABI at all then?!

Please, just use webasm. It aims to solve all problems outlined in the first post. Here is original webasm vision: https://hacks.mozilla.org/2019/08/webassembly-interface-types/ . (Notice that this document mentions Rust 12 times, so, yes, they certainly think about Rust a lot.) As well as I understand not everything from original vision is implemented now. But I think we should concentrate resources on webasm, and not to duplicate effort.

At this moment you may say: "But webasm uses its own binary format, and cannot access raw OS APIs!" Yes, this is true. So, yes, crABI with support for native binary formats (such as ELF) still makes sense. But, please, borrow as much as possible from webasm, because I do not want duplication of effort! For example, borrow wasm interface types in full. And wasm's IDL. You will get "webasm, but for native binaries".

At this moment you may say: "Stop this! crABI is defined as lowering to C ABI!" Okay then. Create way to lower wasm interface types to C ABI. This will allow us to reuse existing type model and its IDL and meet crABI requirements.

At this moment you may cry: "Why we need IDL? Rust struct definition is IDL". Okay. But, as well as I understand, ability for two languages, both of which are not Rust, to call each other, is a requirement. So, Rust struct definition will not work here. We need IDL.


crABI still makes sense if it provides something really important. Something lacked by webasm. For example, safety (in Rust sense). I want crABI to be able to connect two languages, both of which has concept for safety, in such way that safety is preserved. This will be truly good, and this will justify crABI existence. At this point you may say: "But full safety will require full blown lifetimes support! We don't want this!" Well, not necessary. We may just forbid normal references and require using Rcs (or similar) instead.


  • Owned pointers to any supported type (e.g. Box), as well as owned pointers
    to types that can't be passed by value.
  • References, with a limited degree of lifetime support.

Okay, so you distinguish between owned pointers and references. So, it follows that languages with crABI support are not garbage collected, and they have owned and non-owned pointers as separate concepts. This leaves out Java, C#, Haskell, Ocaml, etc, etc. In the same time you say in the beginning: "high-level programming languages that have safe data types". Okay, so we need safety. This leaves out C, C++, Zig, etc. Okay, so what is left? I don't know any other language, except for Rust itself, which matches this criteria. Why you didn't provide any example of language, which may support crABI? If we leave out requirement for owned and non-owned pointers and just require that the language is not garbage collected, then I know some languages, which match this criteria: Swift, Lean 4 and Koka. As well as I understand, all these languages are safe, are not garbage collected and are reference counted. But they do not support owned and non-owned pointers

@fogti
Copy link
Contributor

fogti commented Dec 5, 2024

good interop with https://www.circle-lang.org/site/index.html might be interesting (as a possible other language with comparatively "safe" memory management), I suppose.

@bjorn3
Copy link
Member

bjorn3 commented Dec 5, 2024

What?! So, with crABI you will have 2 translations in worst case (to crABI and from crABI)

The compiler will optimize out those translations if nothing needs to be copied.

and with webasm you will have 1 translation in worst case (assuming language pair itself is well supported)

Passing around strings and lists with the wasm component model will always perform a full copy of the string or list. With crABI it would only result in a copy if you decide to convert between the crABI type and the type that is native to your system and there is no zero-copy conversion possible. If you decide to operate directly on the crABI type (or pass it around rather than touching it) there would be no copy at all. If you are using wasm without the component model, then you still need to define your own ABI (which could eg be crABI, or the C ABI). Additionally wasm has a non-trivial overhead over native code and there is nothing stopping you from using the component model with native code. And finally the component model is designed around the limitations of both sides of the ABI boundary using completely separate address spaces, which means that for example pointer passing is not possible and borrowing a value of the caller is not possible, which severely restricts the kind of API's you can create using the component model and how efficient you can make those API's.

@tmccombs
Copy link
Contributor

tmccombs commented Dec 5, 2024

So, it follows that languages with crABI support are not garbage collected

Not at all. If a GC language receives an owned pointer, it can wrap it in a garbage collectable object that calls the destructor when it is collected. And it when calling a foreign function it can know if it needs to make a copy of an object to get a pointer to an owned object, or can just pass a pointer to an existing object.

Okay, so we need safety. This leaves out C, C++, Zig, etc.

These languages would be able to use crABI, there just might not be as much help from the compiler to help you keep the contract, which is no different than current FFI, except that the signiture provides built-in documentation about safety considerations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.