-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Generic integers #2581
RFC: Generic integers #2581
Conversation
fb23779
to
38e6de2
Compare
I like the idea, and I would really love to have efficient and ergonomic strongly-typed bitfields. However, this proposal feels too magical for my taste; there is to much stuff built in to the compiler. I would rather expose a single simple primitive that allows implementing arbitrarily sized ints efficiently. Just a half-baked idea: We only build a #[repr(packed)]
struct int<const Width: usize> {
bits: [Bit; Width],
}
#[repr(packed)]
struct uint<const Width: usize> {
bits: [Bit; Width],
} with the appropriate operations implemented via efficient bit-twiddling methods or compiler intrinsics for performance. |
(independently from the remark above)
This RFC spends quite some time explaining the memory layout of `int<N>`
and `uint<N>` types. Is user code allowed to rely on this memory layout,
or is it not?
Intuitively it'd be better if the layout was not actually defined, to
allow for further optimizations, but the current text makes me believe
it is defined. The one case where layout would be mandatory would be for
a not-yet-written `#[repr(bitpacked)]` RFC, in my opinion.
Oh, and
At the time of writing, no known programming language offers this
level of integer generalisation, and if this RFC were to be accepted,
Rust would be the first.
Programming languages with dependent types (F*, etc.) do offer this, and
actually much more. :)
|
Well, you can represent a |
text/0000-generic-int.md
Outdated
|
||
## Primitive behaviour | ||
|
||
The compiler will have two new built-in integer types: `uint<N>` and `int<N>`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: these are two new built-in integer type families or type constructors. You really get 256
new types, not two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you but I also wonder if this is the language most people would prefer to use. For example, would you consider Vec<T>
to also be a family of types, or just a generic type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the type constructor is really Vec
and not Vec<T>
; but "two new built-in generic integer types" is I think clear enough.
text/0000-generic-int.md
Outdated
The compiler will have two new built-in integer types: `uint<N>` and `int<N>`, | ||
where `const N: usize`. These will alias to existing `uN` and `iN` types if `N` | ||
is a power of two and no greater than 128. `usize` and `isize` remain separate | ||
types due to coherence issues, and `bool` remains separate from `uint<1>` as it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elaborate on what those coherence issues are for unfamiliar readers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Traits can be implemented separately & differently for u32, u64, usize, etc. so unifying usize with uN for appropriate N would cause overlapping impls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may not be able to get to this by the weekend but I will try to remember to elaborate more on this.
text/0000-generic-int.md
Outdated
For example, this means that `uint<48>` will take up 8 bytes and have an | ||
alignment of 8, even though it only has 6 bytes of data. | ||
|
||
`int<N>` store values between -2<sup>N-1</sup> and 2<sup>N-1</sup>-1, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
store/stores, pick one :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Store was a typo. :p
In addition to the usual casts, `u1` and `i1` can also be cast *to* `bool` via | ||
`as`, whereas most integer types can only be cast from `bool`. | ||
|
||
For the moment, a monomorphisation error will occur if `N > 128`, to minimise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my biggest concern; I don't think monomorphization of this magnitude errors belong in the language and I would like to see this changed before stabilization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be willing to elaborate more on this? Why are these errors a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nature of Rust's bounded polymorphism is that type checking should be modular so that you can check a generic function separately from the instantiation and then the instantiation of any parameters satisfying bounds should not result in errors.
This makes for more local error messages (you won't have post monomorphization errors in some location far removed from where instantiation happened...), fewer surprises (because this is how polymorphism works everywhere else in the type system) and possibly also better performance.
Another benefit of avoiding post monomorphization errors is that the need to monomorphize as an implementation strategy is lessened. That said, there are instances where the compiler will cause post monomorphization errors, but those are extremely unlikely to occur in actual code. In the case of N > 128
it is rather quite likely.
The general principle is that you declare up front the requirements (with bounds, etc.) to use / call an object and then you don't impose new and hidden requirements for certain values.
If you want to impose N > 128
, then that should be explicitly required in the "signatures", e.g. you should state struct uint<const N: usize> where N <= 128 { .. }
(and on the impls...). Otherwise, it should work for all N: usize
evaluable at compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. you should state
struct uint<const N: usize> where N <= 128 { .. }
(and on the impls...)
Is this possible with any currently planned or near future const generics? I don't recall seeing any RFCs that would support const operations in where clauses (although I would love to have them available). This specific case might be possible with some horrible hack like
trait LessThan128 {}
struct IsLessThan128<const N: usize>;
impl LessThan128 for IsLessThan128<0> {}
⋮
impl LessThan128 for IsLessThan128<127> {}
struct uint<const N: usize> where IsLessThan128<N>: LessThan128 { .. }
but 🤢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nemo157 not possible with RFC 2000 but might be with future extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a different formulation which might work(if you can use a const expression in associated types):
struct Bool<const B:bool>;
trait LessThan128<const N: usize> {
type IsIt;
}
impl<const N:usize> LessThan128<N> for () {
type IsIt=Bool<{N<128}>;
}
struct uint<const N: usize>
where (): LessThan128<N,IsIt=Bool<true>>
{ .. }
Edit:
If you want to include an error message in the type error,you can use this to print an error message in the type itself.
struct Str<const S:&'static str>;
struct Bool<const B:bool>;
struct Usize<const B:bool>;
trait Assert<const COND:bool,Msg>{}
impl<const COND:bool,Msg> Assert<COND> for ()
where
():AssertHelper<COND,Output=Bool<true>>,
{}
trait AssertHelper<const COND:bool,Msg>{
type Output;
}
impl<Msg> AssertHelper<true,Msg> for (){
type Output=Bool<true>;
}
impl<Msg> AssertHelper<false,Msg> for (){
type Output=Msg;
}
trait AssertLessThan128<const N:usize>{}
impl<const N:usize,const IS_IT:bool> AssertLessThan128<N> for ()
where
():
LessThan<N,128,IsIt=Bool<IS_IT>>+
Assert<IS_IT, (
Str<"uint cannot be constructed with a size larger than 128,the passed size is:",
Usize<N>>
) >
{}
trait LessThan<const L:usize,const R:usize> {
type IsIt;
}
impl<const L:usize,const R:usize> LessThan<L,R> for () {
type IsIt=Bool<{L<R}>;
}
struct uint<const N: usize>
where (): AssertLessThan128<N>
{ .. }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK stable Rust does not currently have any monomorphization time errors - if you find any, it's a bug - so this RFC "as is" would be introducing them into the language =/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn poly<T>() {
let _n: [T; 10000000000000];
}
fn monomorphization_error() {
poly::<u8>(); // OK!
poly::<String>(); // BOOM!
}
fn main() {
monomorphization_error();
}
text/0000-generic-int.md
Outdated
## Standard library | ||
|
||
Existing implementations for integer types should be annotated with | ||
`default impl` as necessary, and most operations should defer to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why default impl
would be used here... elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially, your default impl
s are your base cases and the other cases will recursively rely upon them. For example, <uint<48>>::count_zeroes
would ultimately expand to u64::count_zeroes
minus 24. I'll try to elaborate more in the RFC text itself later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mention that it should be default fn
in the text but don't put it in the count_zeros
example, including it to the example + showing one of the specialized implementations for a power of two could clarify this somewhat.
text/0000-generic-int.md
Outdated
|
||
Once const generics and specialisation are implemented and stable, almost all of | ||
this could be offered as a crate which offers `uint<N>` and `int<N>` types. I | ||
won't elaborate much on this because I feel that there are many other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should :) Do we know that it is actually implementable as a library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to get to this this weekend.
> Programming languages with dependent types (F*, etc.) do offer this, and actually much more. :)
Well, you can represent a [`Fin : Nat -> Type`](https://github.com/idris-lang/Idris-dev/blob/master/libs/base/Data/Fin.idr) type constructor in dependent typing pretty easily, but those are quite inefficient...
Low* (a part of F* that deals with machine types) can represent those
and still compile to C, with for instance the type
```fstar
x:int_32{x >= -128l /\ x < 128l}
```
This type will be compiled (to C) as an `int32_t` type, after F* will
have proven that the value can indeed not be out of the `[-128, 128[`
bounds.
More details can be found about machine integers in F* on
[the F* wiki](https://github.com/FStarLang/FStar/wiki/Machine-integers)
|
text/0000-generic-int.md
Outdated
from `uint<N>` to `int<M>` or `uint<M + 1>`, where `M >= N`. | ||
|
||
In addition to the usual casts, `u1` and `i1` can also be cast *to* `bool` via | ||
`as`, whereas most integer types can only be cast from `bool`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this treats 1 and -1 as true
depending on the signedness? I rather like the route of identifying true with -1 instead of 1, but it's not the route Rust has chosen so it might be a bit controversial. From another angle, while it's consistent with true being represented as a single 1 bit, it's also inconsistent with the fact that bool as iN
(for currently existing iN) turns true into 1.
Is there motivation for providing these cases instead of just making people write x != 0
, other than "we can"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely forgot about sign extension when doing this and now that you mention it, it makes sense that if this were offered, then only uint<1>
should cast to bool
. Unless anyone has any objections when I next get around to revising the text I'll remove both casts to bool
.
text/0000-generic-int.md
Outdated
`int<N>` store values between -2<sup>N-1</sup> and 2<sup>N-1</sup>-1, and | ||
`uint<N>` stores values between 0 and 2<sup>N</sup>-1. One unexpected case of | ||
this is that `i1` represents zero or *negative* one, even though LLVM and other | ||
places use `i1` to refer to `u1`. This case is left as-is because generic code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: integer types in LLVM don't have inherent signedness, they are bags of bits that are interpreted as signed or unsigned by individual operations, and i1 true
is treated as -1 by signed operations (e.g., icmp slt i1 true, i1 false
is true -- slt being signed less-than).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't actually know this-- I'll be sure to update the text to be accurate there.
text/0000-generic-int.md
Outdated
Because sign extension will always be applied, it's safe for the compiler to | ||
internally treat `uint<N>` as `uint<N.next_power_of_two()>` when doing all | ||
computations. As a concrete example, this means that adding two `uint<48>` | ||
values will work exactly like adding two `u64` values, generating exactly the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs more thorough discussion. To take this example, an u48 add can overflow the 48 bits, setting some high bits in the 64 bit register. If we take that as given, u48 comparisons (to give just one example) need to first zero-extend the operands to guarantee the comparison works correctly. Conversely, we could zero-extend after arithmetic operations to get the invariant that the high 16 bits are always zero, and then use that knowledge to implement 48 bit comparisons as a plain 64 bit comparisons. Likewise for i48: you'll need sign extensions. In some code sequences compiler optimizations can prove the zero/sign extension redundant, but generally there is no free lunch here -- you need some extending even in code that never changes bit widths. Eliminating sign extensions is in fact the main reason why C compilers care about signed integer addition being UB rather than wrapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't actually added this section in the original RFC when I requested feedback, and rushed it in because I felt it was necessary. You are right that in most cases, this wouldn't be a no-op, although I'm curious if optimisations could make them so.
This is certainly a case for adding add_unchecked
and co. as suggested by… some other RFC issue I don't have the time to look up right now.
Either way, I'll definitely take some time this weekend to revise this section.
Primitive operations on `int<N>` and `uint<N>` should work exactly like they do | ||
on `int<N>` and `uint<N>`: overflows should panic when debug assertions are | ||
enabled, but ignored when they are not. In general, `uint<N>` will be | ||
zero-extended to the next power of two, and `int<N>` will be sign-extended to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intended to be a user-facing guarantee? e.g. given x: &uint<48>
are the following two guaranteed to give the same result:
unsafe { *(x as *const uint<48> as *const u64) }
*x as u64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not guarantee this, considering how x as *const uN as *const uM
only holds on little-endian systems. Casting after dereferencing should work like casting values as usual, though.
I'll try and remember to clarify this in the text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you guarantee it on little-endian systems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm adding that as an unresolved question.
I didn't see any discussion within the RFC about not using It seems like it should at least be mentioned in the alternatives section... |
@mark-i-m's suggestion strikes me as deeply complementary to @clarcharr's proposal. The leading motivation of this RFC is supporting bitfields, but using a number (albeit one guaranteed to be the right number of bits) to represent a bitfield is often a logical type-mismatch. To use the example from the RFC: #[repr(bitfields)]
struct MipsInstruction {
opcode: u6,
rs: u5,
rt: u5,
rd: u5,
shift: u5,
function: u6,
} How often does it really make sense to add or subtract from an With @mark-i-m's suggestion, we can have a more well-typed version of this struct: #[repr(packed)]
struct MipsInstruction {
opcode: [Bit; 6],
rs: u5,
rt: u5,
rd: u5,
shift: u5,
function: [Bit; 6],
} It never makes sense to represent (Disclaimer: I'm not a MIPS expert; I'm just going off the wikibook. I can't tell whether As for the second half of @mark-i-m's suggestion: I can't tell whether it's merely an implementation detail or if it has semantic differences from @clarcharr's proposal. Regardless, it would be a good candidate for discussion in the Alternatives section. |
Arrays aren't (and can't be changed to be) |
Finally getting around to a few comments:
I feel that defining memory layout is important because there doesn't seem to be a compelling argument otherwise. Rust very much avoids "undefined"ness whenever possible. People will want to know how these types operate in a
I'll take a look later and add these to the prior art section.
I'll add it to alternatives, although the main reason against this would be that it doesn't allow generic impls even though it is generic. Unless |
In terms of offering |
Regarding |
While |
My thinking was that However, now that you mention it, IIUC, the size and alignment of types is tracked in bytes in the compiler. So some work would need to be put into making the compiler track bits, but i suspect similar work would need to be put into the current RFC proposal anyway to make bitfields work. Also one other minor thing that I didn't see mentioned in the RFC: does |
@rkruppe Sorry, I just saw your comment
I was curious why. Does this break some other stability guarantee we have? |
@mark-i-m the sizes of |
Hmm... so using |
@mark-i-m In this case, no. |
Arrays (and slices, which have the same layout as arrays of the same length) guarantee that each element is separately addressable -- that makes the Index/IndexMut impls and |
Couldn't we signal that individual auto trait Addressable {}
// Individual bits are unaddressable.
impl !Addressable for Bit {} |
@jswrenn There was a big discussion of doing this for |
@jswrenn Auto traits are not assumed to be implemented by default (e.g. |
# Summary | ||
[summary]: #summary | ||
|
||
Adds the builtin types `uint<N>` and `int<N>`, allowing integers with an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the types should be called UInt<N>
and Int<N>
since that is more conventional these days; however, all the primitive types are lower cased so perhaps not... I'm torn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The weird thing is that they are generic primitives, which I've never seen in a language before...
Perhaps we should do something suitably weird and have new notation for the type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are plenty of primitive type constructors -- references, raw pointers, tuples, arrays, slices -- they just all have special syntax as well instead of using angle brackets. I don't think special syntax for these new primitives is worth it.
to be solved during the development of this feature, rather than in this RFC. | ||
However, here are just a few: | ||
|
||
* Should `uN` and `iN` suffixes for integer literals, for arbitrary `N`, be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to this would be to simply have u<7>
and i<42>
which would be almost as short...
Perhaps that's too short to be understandable? Chances are tho that given the fundamental nature of the types that people would remember it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I thought about this too. uint
and int
seem inconsistent with the other integer types somehow, so maybe u
and i
are the right choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, either way, would we need to make a breaking change to make these identifiers reserved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to make breaking changes; you can always shadow the type with something else afaik.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm adding this to the alternatives section but stating against it because i
is such a common variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's a different namespace so there isn't a conflict. The following "works":
struct i<i> { i: i };
let i: i<i32> = i { i: 4 };
(Said without actually taking a position on whether I think i
would be a good name for the type constructor in question.)
text/0000-generic-int.md
Outdated
signed simply depends on whether its lower bound is negative. | ||
|
||
The primary reason for leaving this out is… well, it's a lot harder to | ||
implement, and could be added in the future as an extension. Longer-term, we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps... tho you should elaborate on the implementation difficulty as you see it.
...but the ranges feel also much more useful generally for code that isn't interested in space optimizations and such things but rather want to enforce domain logic in a strict and more type-safe fashion. For example, you might want to represent a card in a deck as type Rank = uint<1..=13>;
. Then you know by construction that once you have your the_rank : Rank
then it is correct and you won't have to recheck things. Of course, the other, more elaborate type safe way is to use an enum
, but it might also be less convenient to setup than a simple range.
I think this alternative should be seriously entertained as the way to go; then you can use type aliases / newtypes to map to the range based types, e.g. type uint<const N: usize> = urange<{0..=pow(2, N)}>;
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean, urange<{0..=pow(2, N) - 1}>
. :p
But actually, you're right. I should seriously clarify that and write it down in the alternatives.
text/0000-generic-int.md
Outdated
```rust | ||
impl<const N: usize> uint<N> { | ||
fn count_zeros(self) -> u32 { | ||
let M = N.next_power_of_two(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume this has to be const M = N.next_power_of_two();
since you can't use a non-const value in the type parameter on the next line. This appears to not be allowed by RFC2000 though.
It seems that this could be written
impl<const N: usize> uint<N> {
fn count_zeros(self) -> u32 {
let zeros = (self as uint<{ N.next_power_of_two() }>).count_zeros();
zeros + (N.next_power_of_two() - N)
}
}
but I'm not certain if the const expression there would be accepted by the current const generics implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right and I replaced let M
with const M: usize
for now. I think that's valid.
`(bit_size_of::<T>() + 7) / 8 == size_of::<T>()`. All types would have a bit | ||
size, allowing for a future `repr(bitpacked)` extension which packs all values | ||
in a struct or enum variant into the smallest number of bytes possible, given | ||
their bit sizes. Doing so would prevent referencing the fields of the struct, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are currently limiting the size to 128 bits (but maybe more in the future) wouldn't it be possible to define a reference to a bitpacked generic integer as a (ref, (u16, u16))
where the second pair is a (start, length) pair within?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind clarifying here? Not quite sure what you mean.
Neat! I'd still have to use my own implementation for |
Honestly not a bad idea to add support for that behind a flag ¯\_(ツ)_/¯ |
C is getting generic integers: |
I might try and update this RFC with some prior art for zig, C, and LLVM at some point, maybe this weekend. Not sure if it'd be worth resubmitting though, since I have no idea where this'd be in terms of priorities. We could probably get away with a very minimal MVP without any additions to the current const generics on stable, although it wouldn't add all the stuff we want. In particular, I guess that the biggest concern is the ability to implement And then there's still the |
This RFC was postponed on const generics (although I don't see that as blocking this as I prefer the u48 form over uint<48>). Now that a minimal level of that has hit stable, how does this get reopened? Prior art: I believe that Ada (general purpose language) and VHDL and SystemC (hardware description languages) also allow arbitrary integer sizes. |
The main reason for offering a generic form That said, there's nothing stopping us from offering At this point, I would really love to restart this RFC with the things we've learned since, but I simply have too much going on with work + other stuff at the moment. As I said, if anyone else wanted to take over the role of doing that, I'd be happy to help however I can. |
Is there somewhere that the "things learned" are documented? In addition, I don't like coupling "Generic Integers" to "Bitfields" at all. "Bitfields" have a lot of edge cases while a Rust struct that is "packed" is still useful even if you don't necessarily know exactly how it is packed. (ie RGBA--10/10/10/2 can be packed into 4 bytes even if you don't know the exact order while "unpacked" would cost you at least 7 bytes--almost double the size and corresponding problems with cache lines). While I do want bitfields, I suspect that having generic integers in the language would help people implementing bitfields to explore the space more effectively before converging. I guess one obvious question would be "What does uint<N> do to the language grammar?" |
I agree that generic integer lengths should not be seen "as-if bitfields", as the rules for C bitfields are subtle and highly implementation dependent. However, Rust users will likely use them as an implementation convenience for Rust types that work "like a bitfield" if introduced, so we should consider their semantics in that regard. Of course, people already use u8, u16, etc. for that same reason, so that's nothing new or shocking. |
I mean, there's precedent from Zig for using generic integer types for bitfields, as their |
Sure. As far as the length-oriented And I actually think it should be confined to a bound equivalent to
However, eventually that should rise at least to 128 for the obvious reasons. It would also be nice to be able to go up to 256. This would help simplify implementations of "mask vectors" for AVX512 (currently only requires 16 and 64, but...) and eventually SVE2 (can go up to 256) and RISCV-V, etc. It might even simplify working with oddly-shaped data types like Intel's 80-bit floats. |
I think that the long-term goal should be to make |
C23 and C++ are getting generic integer types |
LLVM getting proper support should make supporting this a lot easier. |
iirc llvm already has proper support, at least for and, or, xor, shifts, comparison, add, sub, and multiply. division needs runtime library helpers (I don't remember if just the existing 128-bit helpers are sufficient). |
LLVM RFC is up for adding library functions (div/rem) for >128-bit integers: |
I would like to note that C23 is standardizing "N2709 - Adding a Fundamental Type for N-bit Integers" https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf At this point, Rust will actually be lagging C without support for generic integers. |
I think it definitely weighs in strongly on the utility of the type, but we also actually trail behind C and C++ in many other things as well. Adding one more won't hurt. One thing that came up in idle conversation when discussing the extension to the C Standard with a friend is that in Rust we don't necessarily have a good story for what operations with these types look like... C has the advantage (yes, it is an advantage here!) of accepting certain kinds of numeric promotion, which means that having what are effectively 4096 different integer types is not a problem because they interoperate with each other for simple operations. Are we sure we want that to be the same for Rust? However, that's actually not something that requires deciding with the implementation, it's just an unresolved question that came up. |
As evident by the fact that this was postponed instead of closed, there is a pretty strong desire to add these to Rust at some point; it just wasn't on the road map at the time this RFC was done since const generics weren't even close to being done yet, and there's still the unanswered question of numerically constraining If people have answers to these questions and the time to write an RFC, I would say go for it. But otherwise, there's not the biggest benefit to posting here since this is still a closed RFC. I wonder, maybe it would be worth opening up a discussion somewhere in the const generics WG? |
After trying to work with several bitfield crates, I stumbled upon this one: https://github.com/danlehmann/bitfield #[bitfield(u4)]
struct TestChild {
field1: u2,
field2: u2,
}
#[bitfield(u8)]
struct TestParent {
field1: TestChild,
field2: u4,
} Of course, this does not work, as proc macros can't "access" types, so My current idea is to add a const field To me it feels like we're reimplementing stuff the compiler does, only that the compiler likes to work with bytes and we reimplement things for working with bits, e.g. What would a first helpful step be? Maybe allowing the compiler to recognize bit-sized types? |
Adding the notion of a "bit-sized" type (as opposed to a |
imho bit-sized types would always be padded out to an integer number of bytes, except for inside bit-packed structs/enums, which are like |
Has enough time passed to revisit this yet? :P |
🖼️ Rendered
📝 Summary
Adds the builtin types
uint<N>
andint<N>
, allowing integers with an arbitrary size in bits. For now, restricts N ≤ 128.💖 Thanks
To everyone who helped on the internals thread to review this RFC, particularly @Centril, @rkruppe, @scottmcm, @comex, and @ibkevg.