-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Existential types with external definition #2492
RFC: Existential types with external definition #2492
Conversation
@crlf0710 Thanks, copied to the OP disassociated from the specific commit. |
So if I understand this proposal correctly, it avoids that problem by essentially not generating any code until the instantiation of the existential appears in the crate tree? Every function becomes implicitly generic over whatever instance is picked for whatever existentials are still "open"? That seems like a huge change in the way code is generated, which is why I am surprised to not see it discussed further. If that's not what happens, the I do not understand how you plan to actually implement this proposal. |
@RalfJung Yes exactly. I didn't discuss it further because since that's what generics do so I figured it wasn't very novel, but I'm to add more text. [FWIW, eventually we might be able to do a bound like |
@RalfJung "how code is generated" is the simplest part, the type-checking side of things is way more involved. It's very similar to a parametrized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have doubts about two of the motivating examples
|
||
- [`core::alloc::GlobalAlloc`](https://doc.rust-lang.org/nightly/core/alloc/trait.GlobalAlloc.html), chosen with [`#[global_allocator]`](https://doc.rust-lang.org/1.23.0/unstable-book/language-features/global-allocator.html) | ||
- `panic_fmt` chosen with [`#[panic_implementation]`](https://github.com/rust-lang/rfcs/blob/master/text/2070-panic-implementation.md) | ||
- The OOM hook, modified with [`std::alloc::{set,take}_alloc_error_hook`](https://doc.rust-lang.org/nightly/std/alloc/fn.set_alloc_error_hook.html) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OOM hook can be changed repeatedly times at run time. I don't know where (if anywhere) this ability is used, but at least it's not obvious that we even can replace the OOM hook with a static singleton. Deciding that requires wading into the details of OOM handling which is probably out of scope for this RFC.
(There is of course the option of building a singleton with mutable state that provides exactly the current API, but if that would be used widely, many of the purported benefits evaporate.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should link the alloc
lang item that exists right now. All this is just machinary over oom_impl
, exposed in alloc::alloc::handle_alloc_error
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't care much about existing implementation details here, but about the public (if unstable) API provided. If the public API we end up providing in alloc
is a runtime-settable hook instead of something statically dispatched for whatever reasons, then it simply isn't very relevant to this RFC (though this RFC, if accepted, would be one way to implement that hook). I am sympathetic to wanting static dispatch by default and letting those who need runtime-varying behavior implement a hook themselves, but again, the details of how OOM handling ought to be done are controversial and out of scope for this RFC, so IMO the RFC text is being over-eager by saying this feature would obsolete the hook in its current form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N.B. per rust-lang/rust#51607 (comment) we might be changing to a static hook anyways.
- `panic_fmt` chosen with [`#[panic_implementation]`](https://github.com/rust-lang/rfcs/blob/master/text/2070-panic-implementation.md) | ||
- The OOM hook, modified with [`std::alloc::{set,take}_alloc_error_hook`](https://doc.rust-lang.org/nightly/std/alloc/fn.set_alloc_error_hook.html) | ||
- [`std::collections::hash_map::RandomState`](https://doc.rust-lang.org/std/collections/hash_map/struct.RandomState.html), if https://github.com/rust-lang/rust/pull/51846 is merged, the `hashmap_random_keys` lang item | ||
- [`log::Log`](https://docs.rs/log/0.4.3/log/trait.Log.html) set with [`log::set_logger`](https://docs.rs/log/0.4.3/log/fn.set_logger.html) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging can require resources such as files, network connections, run-time configuration data, etc. so it's difficult to make some logger implementations into static
s. In theory everything could be initalized lazily on first logging call (though then you haven't eliminated the overhead of dynamic dispatch!), but this mostly just shifting the problem (and run-time costs) around and in more complex scenarios -- e.g. when you want to read a configuration file to determine what kind of logging to do -- this can require making much more state global than currently necessary. It also affects all other control flow surrounding logger initialization, e.g. error reporting from being unable to open a file for logging now has to be moved into the lazy-initialization code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not every case needs initialization. And example is serial port logging for embedded.
For the logging case there's these options:
-
Just as local allocation exists, there should be
*_in
variants of the macros that allow passing around a local logger. One should never be forced into using a singleton. -
The case where one really wants a single xxx, such that a static is nice because any passed pointer / value is just overhead, but also wants to be sure the xxx is initialized first, is heavily explored by @japaric and the rest of the working group. In general, something still needs to be passed around, but it can just be a ZST "token" indicating initialization is complete. So this is just a riff off the above. My stuff still helps if you want to be monophonic over the token type.
-
Recreate today with a mutable singleton and manually initialize it. Unlike today this is opt-in. Something like
lazy_static
+ my stuff can make this more ergonomic. -
Lazy init as you mention. Actually
lazy_static
crate + my stuff makes this decently ergonomic.
So overall I think mine is still value for not needing to cfg around the no-std case my offering a better separation of concerns between functionality and its "singletonness", while inciting one to make the singleton aspect optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not every case needs initialization. And example is serial port logging for embedded.
Yes, obviously.
Just as local allocation exists, there should be *_in variants of the macros that allow passing around a local logger. One should never be forced into using a singleton.
While in theory I commend getting rid of global state, the fact is that such an API for logging is entirely hypothetical while the "global logger" API is pervasive in the ecosystem. If it's a huge pain to replace the current global singleton with a different mechanism for installing the same global singleton, everyone who (for whatever reasons) uses the global singleton will rightly prefer the current way of installing it over what this RFC presents.
Besides, this argument cuts both ways: all the code that avoids global singletons doesn't need this RFC either.
Recreate today with a mutable singleton and manually initialize it. Unlike today this is opt-in. Something like lazy_static + my stuff can make this more ergonomic.
As with the OOM hook, the question is how commonly this would be done. If it's quite common, the benefit of external existential traits for this use case is diminished.
Lazy init as you mention. Actually lazy_static crate + my stuff makes this decently ergonomic.
lazy_init is nice but it doesn't help you get the data needed for initialization (e.g., user configuration) into the block doing the initialization, as it can only reference other globals, not e.g. a local binding created in main
. That's what I meant by requiring more state to be made global.
None of this is a fatal objection to using external existential types for logging. I am just saying that for some quite common use cases, it won't have all the claimed benefits and indeed some downsides as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What downsides? The worst case of manual initialization forced by effects and user config works exactly like today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downside is that any logger impementations that needs it has to separately go through the work of (and accept the risk of bugs in) emulating today's behavior. (Plus the churn of changing APIs.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkruppe liblog can come with a ManuallyInitializedLogger<L>
that does all that for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, true. It's still not clear to me that a significant portion of liblog users would do anything different, but let's leave that discussion at #2492 (comment)
I agree with @eddyb that the codegen aspect is relatively simple conceptually and in implementation. However, applying this feature to pervasive concerns such as panicking and memory allocations will make a staggering amount of currently monomorphic (or generic but monomorphized in libraries rather than leaf crates) code generic and only monomorphized in leaf crates, which has practically the same impact as MIR-only rlibs: virtually no LLVM IR or machine code is generated while compiling libraries, most of it will be monomorphized and codegen'd only when reaching the leaf crates. While MIR-only libs are very desirable for a number of reasons (see the linked issue), there are also good reasons why we still don't have that feature: it significantly regresses wall-clock build times. Further experiments in this direction are considered blocked by parallelizing rustc, which to my knowledge is being pushed forward but probably still a far cry from being turned on by default, let alone being effective enough to offset the downsides of MIR-only rlibs. This should be taken into account when estimating how quickly this feature can be landed and applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me.
NB. existential type
is defined in RFC 2071.
text/0000-extern-existential-type.md
Outdated
Only one crate in the build plan can define the `pub extern existential type`. | ||
Unlike the trait system, there are no orphan restrictions that ensure crates can always be composed: | ||
any crate is free to define the `pub extern existential type`, as long is it isn't used with another that also does, in which case the violation will only be caught when building a crate that depends on both (or if one of the crates depends on the other). | ||
This is not very nice, but exactly like "lang items" and the annotations that exist for this purpose today, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence and the rest of this paragraph shouldn't be in the "guide-level explanation"
text/0000-extern-existential-type.md
Outdated
As mentioned in the introduction, code gen can be reasoned about by comparing with generic and inlining). | ||
We cannot generate for code for generic items until they are instantiated. | ||
Likewise, imagine that everything that uses an `pub extern existential type` gets an extra parameter, | ||
and then when the `impl pub extern existential type` is defined, we go back and eliminate that parameter by substituting the actual definition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is impl pub extern existential type
?
@rkruppe It need not be pushed all the way to the leaf crates. We can have a |
@jethrogb is right in rust-lang#2492 (comment), this doesn't belong in the guide-level explanation.
First off, you've not laid out any monomorphization strategy, but this crucially depends on how and when we monomorphize. If I had to guess, I'd say you are suggesting something like:
There are some problems with that:
"Eventually", hopefully this problem disappears entirely as MIR-only rlibs become feasible in general. The issue is whether we're willing to eat massive regressions in the mean time. |
- Of course, we can always do nothing and just keep the grab bag of ad-hoc solutions we have today, and leave log with just a imperative dynamic solution. | ||
|
||
- We could continue special-casing and white-listing in the compiler the use-cases I give in the motivation, but at least use the same sort of annotation for all of them for consistency. | ||
But that still requires leaving out `log`, or special casing it for the first time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we'd want to use this for log at all. Runtime configuration of loggers is pervasive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who does that run-time configuration? The libraries logging or the end application?
text/0000-extern-existential-type.md
Outdated
} | ||
} | ||
|
||
extern existential type alloc::Heap = JemallocHeap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This declaration wouldn't live in the jemalloc crate, but rather the downstream consumer. We want to allow people to pull in jemalloc without using it as the global allocator. (Imagine you want to wrap it in a layer of tracking or whatever)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good point thanks @sfackler. I'll amend that example to show the 3rd crate. I suppose that is like today, too.
I have done some experimenting with the extern-existential macro I posted above. I think it works really well. The hard part--that the RFC also doesn't address--is setting defaults. Here's a branch with some of my experiments. I think we should accept this, with zero-sized types only, as an eRFC so we can play with this some more, and can flesh out the defaulting system. |
The final comment period, with a disposition to postpone, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. The RFC is now postponed. |
# Motivation | ||
[motivation]: #motivation | ||
|
||
We have a number of situations where one crate defines an interface, and a different crate implements that interface with a canonical singleton: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another motivating example https://internals.rust-lang.org/t/global-executors/11295
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for letting me know!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was mentioned back then here (#2492 (comment)) and here (#2492 (comment)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gnzlbg Well predicted :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cramertj is the one that realized that this could be useful for futures
If this feature would be slightly extended by allowing the crate to define a default type, this would effectively give us type-checked language-level weak linkage. This would help the embedded ecosystem tremendously, since there are a lot of things that could be expressed in terms of this feature (eg. bare-metal chip initialization code that could be customized downstream to support "weird" chips, exception handlers with reasonable zero-cost defaults, etc.). |
Hi! As a user of Rust, trying to implement cryptographic protocols for embedded devices, I wanted to weigh in on what my use-case is, the approaches I have considered, and ask if there's anything I can do to help this along. If there's a more appropriate place to post this, please let me know. Use CaseThe specific use case here is that the cryptographic protocols in question make use of a lot of common cryptographic primitives, e.g. the BLAKE2b hash function. The embedded devices likely have an existing implementation (which we don't want to duplicate because there are strict firmware size constraints), or an optimized implementations (e.g. if the device includes a cryptographic accelerator, we want to take advantage of it rather than using a slow software implementation). Therefore, we desire the following:
ApproachesExtern existential typesThe real world example is a main This RFC solves the problem since we can have our pub trait Blake512 {
fn new() -> Self;
// ...
}
pub extern existential type Blake2b512: Blake512; The impl library::Blake512 for Blake2b512 {
// ...
}
extern existential type library::Blake2b512 = Blake2b512; I presume it will also be possible to run tests on the Cargo dependency patchingAnother solution is to swap out a crate using the [patch.crates-io]
library_crypto = { package = "firmware_library_crypto" } Since we've broken out the default cryptographic implementations into a new crate ( Apart from the horrific dependency tree (and the foot-guns around trying to emulate the I'm fine with the friction introduced by the various
|
Couldn't each library crate have feature flags for each firmware-specific override? It would mean you'd have some firmware-specific code in each library crate, but at least all functionally-similar code would be grouped together, rather than ending up with a firmware crate with a mess of different functionalities? I imagine you could even automatically enable the correct features across all library crates by using custom targets (one per firmware) and then |
@Diggsey cargo features are not a good match for this because cargo features are conceptually things that are "on" and they are always composable. Moreover they are "unified" by cargo, meaning anything that depends on your crate gets to turn features on and the total of all features is what is built. There is no possibility to have features that are mutually exclusive at the level of cargo. You could do something like, have the library fail the build if two particular features are on. But this leaves you mostly helpless to actually debug the problem, because the library doesn't know which crates turned on those features. Global feature coordination problems are very hard to debug in cargo when you have a large project, and while there are tools like "cargo-feature-analyst" they often don't play well with any other cargo options, and don't actually solve the problem you care about, so they leave a lot to be desired. The problem is even worse when Extern existential types seem like a simple way to say "exactly one implementation should be provided across the entire build" without requiring proliferation of configuration info in Cargo tomls. And if two are provided then cargo can tell you what two crates provided it, then you can debug from there. We need a better way to express the idea of configuring one of several "incompatible modes" of a library. Cargo features is a poor substitute for this, or should be extended to support this better. |
The cryptographic primitives are inherently related to the firmware. In some cases, they're FFI bindings to the chip vendor's optimized code. In others, it's Rust implementations that are known to be fast and constant-time on that hardware configuration. And these cryptographic primitives are used for other things in the firmware (which is why the implementations exist already), they're not only there for the library's use. This library is intended to be used in at least three very distinct firmwares, it would not be practical to move firmware-specific code into the library. For example, the FFI bindings (which are incredibly firmware-specific, obviously) need to be kept up-to-date. We'd have to have every downstream user fork and maintain the library.
Sorry, if it wasn't clear, the The dependency graph is kind of like this (where each of |
As an additional argument, with extern existential types (or additional generic parameters, the “traits bound” solution in @saleemrashid 's message) |
The log library's statics and mutation have caused issues for embedded platforms using the very restricted and jank |
For those who are interested, I drafted an alternative proposal which solves the same problem: https://internals.rust-lang.org/t/idea-facade-crates/18051 |
Rendered
An extension of #2071's
existential type
where the definition can live in a different crate than the declaration, rather than the same module. This is a crucial tool untangling for untangling dependencies withinstd
and other libraries at the root of the ecosystem concerning global resources.In particular, I hope if we do this before #2480, we can stabilize an
alloc
crate whose notions of various global resources to not require copious amounts special attention from the compiler or any run-time cost. I've purposefully picked a more light-weight solution to achieve that goal while minimally delayingalloc
's stabilization and deviating from existing/planned Rust features.