Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Generic integers #2581

Closed
wants to merge 3 commits into from
Closed

Conversation

clarfonthey
Copy link
Contributor

🖼️ Rendered

📝 Summary

Adds the builtin types uint<N> and int<N>, allowing integers with an arbitrary size in bits. For now, restricts N ≤ 128.

💖 Thanks

To everyone who helped on the internals thread to review this RFC, particularly @Centril, @rkruppe, @scottmcm, @comex, and @ibkevg.

@clarfonthey clarfonthey changed the title Generic integers RFC RFC: Generic integers Oct 28, 2018
@mark-i-m
Copy link
Member

I like the idea, and I would really love to have efficient and ergonomic strongly-typed bitfields. However, this proposal feels too magical for my taste; there is to much stuff built in to the compiler. I would rather expose a single simple primitive that allows implementing arbitrarily sized ints efficiently.

Just a half-baked idea: We only build a Bit type into the language, which is guaranteed to be 1 bit large (though it must be padded in a struct unless you use repr(packed)). All of the other integer types are defined as follows:

#[repr(packed)]
struct int<const Width: usize> {
  bits: [Bit; Width],
}

#[repr(packed)]
struct uint<const Width: usize> {
  bits: [Bit; Width],
}

with the appropriate operations implemented via efficient bit-twiddling methods or compiler intrinsics for performance.

@Ekleog
Copy link

Ekleog commented Oct 28, 2018 via email

@Centril
Copy link
Contributor

Centril commented Oct 28, 2018

Programming languages with dependent types (F*, etc.) do offer this, and actually much more. :)

Well, you can represent a Fin : Nat -> Type type constructor in dependent typing pretty easily, but those are quite inefficient...


## Primitive behaviour

The compiler will have two new built-in integer types: `uint<N>` and `int<N>`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: these are two new built-in integer type families or type constructors. You really get 256 new types, not two.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you but I also wonder if this is the language most people would prefer to use. For example, would you consider Vec<T> to also be a family of types, or just a generic type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the type constructor is really Vec and not Vec<T>; but "two new built-in generic integer types" is I think clear enough.

The compiler will have two new built-in integer types: `uint<N>` and `int<N>`,
where `const N: usize`. These will alias to existing `uN` and `iN` types if `N`
is a power of two and no greater than 128. `usize` and `isize` remain separate
types due to coherence issues, and `bool` remains separate from `uint<1>` as it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elaborate on what those coherence issues are for unfamiliar readers?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traits can be implemented separately & differently for u32, u64, usize, etc. so unifying usize with uN for appropriate N would cause overlapping impls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may not be able to get to this by the weekend but I will try to remember to elaborate more on this.

For example, this means that `uint<48>` will take up 8 bytes and have an
alignment of 8, even though it only has 6 bytes of data.

`int<N>` store values between -2<sup>N-1</sup> and 2<sup>N-1</sup>-1, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store/stores, pick one :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Store was a typo. :p

In addition to the usual casts, `u1` and `i1` can also be cast *to* `bool` via
`as`, whereas most integer types can only be cast from `bool`.

For the moment, a monomorphisation error will occur if `N > 128`, to minimise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my biggest concern; I don't think monomorphization of this magnitude errors belong in the language and I would like to see this changed before stabilization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be willing to elaborate more on this? Why are these errors a problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nature of Rust's bounded polymorphism is that type checking should be modular so that you can check a generic function separately from the instantiation and then the instantiation of any parameters satisfying bounds should not result in errors.

This makes for more local error messages (you won't have post monomorphization errors in some location far removed from where instantiation happened...), fewer surprises (because this is how polymorphism works everywhere else in the type system) and possibly also better performance.
Another benefit of avoiding post monomorphization errors is that the need to monomorphize as an implementation strategy is lessened. That said, there are instances where the compiler will cause post monomorphization errors, but those are extremely unlikely to occur in actual code. In the case of N > 128 it is rather quite likely.

The general principle is that you declare up front the requirements (with bounds, etc.) to use / call an object and then you don't impose new and hidden requirements for certain values.

If you want to impose N > 128, then that should be explicitly required in the "signatures", e.g. you should state struct uint<const N: usize> where N <= 128 { .. } (and on the impls...). Otherwise, it should work for all N: usize evaluable at compile time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. you should state struct uint<const N: usize> where N <= 128 { .. } (and on the impls...)

Is this possible with any currently planned or near future const generics? I don't recall seeing any RFCs that would support const operations in where clauses (although I would love to have them available). This specific case might be possible with some horrible hack like

trait LessThan128 {}
struct IsLessThan128<const N: usize>;

impl LessThan128 for IsLessThan128<0> {}impl LessThan128 for IsLessThan128<127> {}

struct uint<const N: usize> where IsLessThan128<N>: LessThan128 { .. }

but 🤢

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nemo157 not possible with RFC 2000 but might be with future extensions.

Copy link

@rodrimati1992 rodrimati1992 Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a different formulation which might work(if you can use a const expression in associated types):

struct Bool<const B:bool>;

trait LessThan128<const N: usize> {
    type IsIt;
}

impl<const N:usize> LessThan128<N> for () {
    type IsIt=Bool<{N<128}>;
}

struct uint<const N: usize> 
where (): LessThan128<N,IsIt=Bool<true>> 
{ .. }

Edit:

If you want to include an error message in the type error,you can use this to print an error message in the type itself.


struct Str<const S:&'static str>;

struct Bool<const B:bool>;

struct Usize<const B:bool>;




trait Assert<const COND:bool,Msg>{}

impl<const COND:bool,Msg> Assert<COND> for ()
where
    ():AssertHelper<COND,Output=Bool<true>>,
{}


trait AssertHelper<const COND:bool,Msg>{
    type Output;
}

impl<Msg> AssertHelper<true,Msg> for (){
    type Output=Bool<true>;
}

impl<Msg> AssertHelper<false,Msg> for (){
    type Output=Msg;
}



trait AssertLessThan128<const N:usize>{}


impl<const N:usize,const IS_IT:bool> AssertLessThan128<N> for ()
where
    ():
        LessThan<N,128,IsIt=Bool<IS_IT>>+
        Assert<IS_IT, (
            Str<"uint cannot be constructed with a size larger than 128,the passed size is:",
            Usize<N>>
        ) >
{}


trait LessThan<const L:usize,const R:usize> {
    type IsIt;
}

impl<const L:usize,const R:usize> LessThan<L,R> for () {
    type IsIt=Bool<{L<R}>;
}

struct uint<const N: usize> 
where (): AssertLessThan128<N>
{ .. }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK stable Rust does not currently have any monomorphization time errors - if you find any, it's a bug - so this RFC "as is" would be introducing them into the language =/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnzlbg

fn poly<T>() {
    let _n: [T; 10000000000000];
}

fn monomorphization_error() {
    poly::<u8>(); // OK!
    poly::<String>(); // BOOM!
}

fn main() {
    monomorphization_error();
}

## Standard library

Existing implementations for integer types should be annotated with
`default impl` as necessary, and most operations should defer to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why default impl would be used here... elaborate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially, your default impls are your base cases and the other cases will recursively rely upon them. For example, <uint<48>>::count_zeroes would ultimately expand to u64::count_zeroes minus 24. I'll try to elaborate more in the RFC text itself later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention that it should be default fn in the text but don't put it in the count_zeros example, including it to the example + showing one of the specialized implementations for a power of two could clarify this somewhat.


Once const generics and specialisation are implemented and stable, almost all of
this could be offered as a crate which offers `uint<N>` and `int<N>` types. I
won't elaborate much on this because I feel that there are many other
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should :) Do we know that it is actually implementable as a library?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to get to this this weekend.

@Centril Centril added T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. labels Oct 28, 2018
@Ekleog
Copy link

Ekleog commented Oct 28, 2018 via email

from `uint<N>` to `int<M>` or `uint<M + 1>`, where `M >= N`.

In addition to the usual casts, `u1` and `i1` can also be cast *to* `bool` via
`as`, whereas most integer types can only be cast from `bool`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this treats 1 and -1 as true depending on the signedness? I rather like the route of identifying true with -1 instead of 1, but it's not the route Rust has chosen so it might be a bit controversial. From another angle, while it's consistent with true being represented as a single 1 bit, it's also inconsistent with the fact that bool as iN (for currently existing iN) turns true into 1.

Is there motivation for providing these cases instead of just making people write x != 0, other than "we can"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely forgot about sign extension when doing this and now that you mention it, it makes sense that if this were offered, then only uint<1> should cast to bool. Unless anyone has any objections when I next get around to revising the text I'll remove both casts to bool.

`int<N>` store values between -2<sup>N-1</sup> and 2<sup>N-1</sup>-1, and
`uint<N>` stores values between 0 and 2<sup>N</sup>-1. One unexpected case of
this is that `i1` represents zero or *negative* one, even though LLVM and other
places use `i1` to refer to `u1`. This case is left as-is because generic code

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: integer types in LLVM don't have inherent signedness, they are bags of bits that are interpreted as signed or unsigned by individual operations, and i1 true is treated as -1 by signed operations (e.g., icmp slt i1 true, i1 false is true -- slt being signed less-than).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't actually know this-- I'll be sure to update the text to be accurate there.

Because sign extension will always be applied, it's safe for the compiler to
internally treat `uint<N>` as `uint<N.next_power_of_two()>` when doing all
computations. As a concrete example, this means that adding two `uint<48>`
values will work exactly like adding two `u64` values, generating exactly the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs more thorough discussion. To take this example, an u48 add can overflow the 48 bits, setting some high bits in the 64 bit register. If we take that as given, u48 comparisons (to give just one example) need to first zero-extend the operands to guarantee the comparison works correctly. Conversely, we could zero-extend after arithmetic operations to get the invariant that the high 16 bits are always zero, and then use that knowledge to implement 48 bit comparisons as a plain 64 bit comparisons. Likewise for i48: you'll need sign extensions. In some code sequences compiler optimizations can prove the zero/sign extension redundant, but generally there is no free lunch here -- you need some extending even in code that never changes bit widths. Eliminating sign extensions is in fact the main reason why C compilers care about signed integer addition being UB rather than wrapping.

Copy link
Contributor Author

@clarfonthey clarfonthey Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't actually added this section in the original RFC when I requested feedback, and rushed it in because I felt it was necessary. You are right that in most cases, this wouldn't be a no-op, although I'm curious if optimisations could make them so.

This is certainly a case for adding add_unchecked and co. as suggested by… some other RFC issue I don't have the time to look up right now.

Either way, I'll definitely take some time this weekend to revise this section.

Primitive operations on `int<N>` and `uint<N>` should work exactly like they do
on `int<N>` and `uint<N>`: overflows should panic when debug assertions are
enabled, but ignored when they are not. In general, `uint<N>` will be
zero-extended to the next power of two, and `int<N>` will be sign-extended to
Copy link

@hanna-kruppe hanna-kruppe Oct 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be a user-facing guarantee? e.g. given x: &uint<48> are the following two guaranteed to give the same result:

  • unsafe { *(x as *const uint<48> as *const u64) }
  • *x as u64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not guarantee this, considering how x as *const uN as *const uM only holds on little-endian systems. Casting after dereferencing should work like casting values as usual, though.

I'll try and remember to clarify this in the text.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you guarantee it on little-endian systems?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding that as an unresolved question.

@Mark-Simulacrum
Copy link
Member

I didn't see any discussion within the RFC about not using uN directly. It seems like that could make quite a bit of sense -- at least for an initial implementation.

It seems like it should at least be mentioned in the alternatives section...

@jswrenn
Copy link
Member

jswrenn commented Oct 29, 2018

We only build a Bit type into the language, which is guaranteed to be 1 bit large (though it must be padded in a struct unless you use repr(packed)).

@mark-i-m's suggestion strikes me as deeply complementary to @clarcharr's proposal. The leading motivation of this RFC is supporting bitfields, but using a number (albeit one guaranteed to be the right number of bits) to represent a bitfield is often a logical type-mismatch. To use the example from the RFC:

#[repr(bitfields)]
struct MipsInstruction {
    opcode: u6,
    rs: u5,
    rt: u5,
    rd: u5,
    shift: u5,
    function: u6,
}

How often does it really make sense to add or subtract from an opcode? By representing it as an unsigned six-bit integer, we signal that arithmetic on an opcode is a well-defined operation.

With @mark-i-m's suggestion, we can have a more well-typed version of this struct:

#[repr(packed)]
struct MipsInstruction {
    opcode: [Bit; 6],
    rs: u5,
    rt: u5,
    rd: u5,
    shift: u5,
    function: [Bit; 6],
}

It never makes sense to represent opcode or function as numbers, so we don't. Conversely, shift is very clearly numeric.

(Disclaimer: I'm not a MIPS expert; I'm just going off the wikibook. I can't tell whether r{s,t,d} ought to be [Bit; 5] or u5.)


As for the second half of @mark-i-m's suggestion: I can't tell whether it's merely an implementation detail or if it has semantic differences from @clarcharr's proposal. Regardless, it would be a good candidate for discussion in the Alternatives section.

@hanna-kruppe
Copy link

Arrays aren't (and can't be changed to be) repr(bitpacked), so [Bit; N] would actually occupy N bytes. While one could instead provide a dedicated Bitvector<N> type, that would have a lot of overlap with uint<N> (which have many applications not covered by Bitvector<N>). I also don't really see the typing benefit @jswrenn suggests: if you care about that, you'd want to use more fine-grained newtypes to distinguish (in this MIPS instruction encoding example) the opcode from the function field or the shift amount from the register numbers. Once you have these newtypes, uint<5> vs Bitvector<5> becomes largely irrelevant as it's an implementation detail. (In fact, one could implement Bitvector<N> as a newtype around uint<N>.)

@clarfonthey
Copy link
Contributor Author

Finally getting around to a few comments:

Is user code allowed to rely on this memory layout, or is it not? Intuitively it'd be better if the layout was not actually defined, to allow for further optimizations, but the current text makes me believe it is defined. The one case where layout would be mandatory would be for a not-yet-written #[repr(bitpacked)] RFC, in my opinion.

I feel that defining memory layout is important because there doesn't seem to be a compelling argument otherwise. Rust very much avoids "undefined"ness whenever possible. People will want to know how these types operate in a #[repr(C)] struct or in an array; would [uint<48>; 2] take up six bytes or eight? Establishing this is crucial for stabilisation imho.

Programming languages with dependent types (F*, etc.) do offer this, and actually much more. :)

I'll take a look later and add these to the prior art section.

I didn't see any discussion within the RFC about not using uN directly. It seems like that could make quite a bit of sense -- at least for an initial implementation.

It seems like it should at least be mentioned in the alternatives section...

I'll add it to alternatives, although the main reason against this would be that it doesn't allow generic impls even though it is generic. Unless uN were just an alias for uint<N>, which seems unnecessary to me.

@clarfonthey
Copy link
Contributor Author

In terms of offering Bit instead of uint-- I'll definitely add this to the alternatives section. Essentially, offering some kind of bits<N> type would be similar to uint<N>, although completely orthogonal to uint<N> and presumably only allowing bit operations, not arithmetic. I still believe that uint<N> is better overall, but thoroughness is important.

@mark-i-m
Copy link
Member

Regarding Bit, I had intended it as a way to not add uint<N> and int<N> as language features. Rather, they could be implemented in a crate as wrappers around [Bit; N]. My motivation is just that adding uint<N> seems like a lot of magic, and I would like to reduce magic.

@clarfonthey
Copy link
Contributor Author

While Bit by itself is less magic than bits<N>, there's a lot of magic for making arrays compact for just one particular type. How does Bit apply in most type contexts? Is (Bit, Bit) compact? Etc.

@mark-i-m
Copy link
Member

My thinking was that Bit had a size of one bit and alignment of 1 bytes. So you need to use repr(packed) to get rid of padding, as with other types.

However, now that you mention it, IIUC, the size and alignment of types is tracked in bytes in the compiler. So some work would need to be put into making the compiler track bits, but i suspect similar work would need to be put into the current RFC proposal anyway to make bitfields work.

Also one other minor thing that I didn't see mentioned in the RFC: does size_of::<uint<N>>() just round up to the nearest byte? or the nearest byte when rounded up to a power of two?

@mark-i-m
Copy link
Member

@rkruppe Sorry, I just saw your comment

Arrays aren't (and can't be changed to be) repr(bitpacked), so [Bit; N] would actually occupy N bytes.

I was curious why. Does this break some other stability guarantee we have?

@clarfonthey
Copy link
Contributor Author

@mark-i-m the sizes of uint<N> are the same as the larger power of two size. So, uint<48> has the same size as u64.

@mark-i-m
Copy link
Member

Hmm... so using repr(packed) actually changes the size of the type?

@clarfonthey
Copy link
Contributor Author

@mark-i-m In this case, no. repr(packed) allows alignment to break, but in this case, uint<48> would have an alignment and size of 8. In this case, we'd need a different form of repr(packed) which allows both size and alignment to break, shoving things down to individual bits. That's mostly what repr(bitfields) is in the RFC; originally, I recommended bitpacked as a name but I changed it and I don't remember why.

@hanna-kruppe
Copy link

@mark-i-m

I was curious why. Does this break some other stability guarantee we have?

Arrays (and slices, which have the same layout as arrays of the same length) guarantee that each element is separately addressable -- that makes the Index/IndexMut impls and iter()/iter_mut() tick, for starters. Even though for all currently existing types that would remain so when arrays start become bitpacked, it would mean we'd have to add new bounds to those things, which would break generic code that doesn't have those bounds.

@jswrenn
Copy link
Member

jswrenn commented Oct 30, 2018

Even though for all currently existing types that would remain so when arrays start become bitpacked, it would mean we'd have to add new bounds to those things, which would break generic code that doesn't have those bounds.

Couldn't we signal that individual Bits are unaddressable with an Addressable auto trait that isn't implemented for Bit? E.g.:

auto trait Addressable {}

// Individual bits are unaddressable.
impl !Addressable for Bit {}

@clarfonthey
Copy link
Contributor Author

@jswrenn There was a big discussion of doing this for DynSized with extern types, and the verdict was to not there. I feel like an Addressable bound would have similar problems.

@hanna-kruppe
Copy link

@jswrenn Auto traits are not assumed to be implemented by default (e.g. fn foo<T>() does not imply T: Send). So beyond just an auto trait, you'd need a new opt-out default bound like Sized, and as @clarcharr mentioned those have been rejected for other purposes in the past.

# Summary
[summary]: #summary

Adds the builtin types `uint<N>` and `int<N>`, allowing integers with an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the types should be called UInt<N> and Int<N> since that is more conventional these days; however, all the primitive types are lower cased so perhaps not... I'm torn.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The weird thing is that they are generic primitives, which I've never seen in a language before...

Perhaps we should do something suitably weird and have new notation for the type?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are plenty of primitive type constructors -- references, raw pointers, tuples, arrays, slices -- they just all have special syntax as well instead of using angle brackets. I don't think special syntax for these new primitives is worth it.

to be solved during the development of this feature, rather than in this RFC.
However, here are just a few:

* Should `uN` and `iN` suffixes for integer literals, for arbitrary `N`, be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to this would be to simply have u<7> and i<42> which would be almost as short...
Perhaps that's too short to be understandable? Chances are tho that given the fundamental nature of the types that people would remember it...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I thought about this too. uint and int seem inconsistent with the other integer types somehow, so maybe u and i are the right choice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, either way, would we need to make a breaking change to make these identifiers reserved?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to make breaking changes; you can always shadow the type with something else afaik.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding this to the alternatives section but stating against it because i is such a common variable name.

Copy link
Member

@scottmcm scottmcm Nov 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's a different namespace so there isn't a conflict. The following "works":

struct i<i> { i: i };
let i: i<i32> = i { i: 4 };

(Said without actually taking a position on whether I think i would be a good name for the type constructor in question.)

signed simply depends on whether its lower bound is negative.

The primary reason for leaving this out is… well, it's a lot harder to
implement, and could be added in the future as an extension. Longer-term, we
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps... tho you should elaborate on the implementation difficulty as you see it.

...but the ranges feel also much more useful generally for code that isn't interested in space optimizations and such things but rather want to enforce domain logic in a strict and more type-safe fashion. For example, you might want to represent a card in a deck as type Rank = uint<1..=13>;. Then you know by construction that once you have your the_rank : Rank then it is correct and you won't have to recheck things. Of course, the other, more elaborate type safe way is to use an enum, but it might also be less convenient to setup than a simple range.

I think this alternative should be seriously entertained as the way to go; then you can use type aliases / newtypes to map to the range based types, e.g. type uint<const N: usize> = urange<{0..=pow(2, N)}>;.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean, urange<{0..=pow(2, N) - 1}>. :p

But actually, you're right. I should seriously clarify that and write it down in the alternatives.

```rust
impl<const N: usize> uint<N> {
fn count_zeros(self) -> u32 {
let M = N.next_power_of_two();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume this has to be const M = N.next_power_of_two(); since you can't use a non-const value in the type parameter on the next line. This appears to not be allowed by RFC2000 though.

It seems that this could be written

impl<const N: usize> uint<N> {
    fn count_zeros(self) -> u32 {
        let zeros = (self as uint<{ N.next_power_of_two() }>).count_zeros();
        zeros + (N.next_power_of_two() - N)
    }
}

but I'm not certain if the const expression there would be accepted by the current const generics implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right and I replaced let M with const M: usize for now. I think that's valid.

`(bit_size_of::<T>() + 7) / 8 == size_of::<T>()`. All types would have a bit
size, allowing for a future `repr(bitpacked)` extension which packs all values
in a struct or enum variant into the smallest number of bytes possible, given
their bit sizes. Doing so would prevent referencing the fields of the struct,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are currently limiting the size to 128 bits (but maybe more in the future) wouldn't it be possible to define a reference to a bitpacked generic integer as a (ref, (u16, u16)) where the second pair is a (start, length) pair within?

Copy link
Contributor Author

@clarfonthey clarfonthey Nov 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind clarifying here? Not quite sure what you mean.

@programmerjake
Copy link
Member

@programmerjake You might want to check out https://github.com/jhpratt/deranged. It's similar to this but provides for range bound integers with common trait impls.

Neat! I'd still have to use my own implementation for rust-hdl since I need support for >128-bit integers, so I have it based on BigInt.

@jhpratt
Copy link
Member

jhpratt commented Jul 14, 2021

Honestly not a bad idea to add support for that behind a flag ¯\_(ツ)_/¯

@programmerjake
Copy link
Member

C is getting generic integers:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf
clang patch where I found my info: https://reviews.llvm.org/rG6bb042e70024

@clarfonthey
Copy link
Contributor Author

clarfonthey commented Jul 27, 2021

I might try and update this RFC with some prior art for zig, C, and LLVM at some point, maybe this weekend. Not sure if it'd be worth resubmitting though, since I have no idea where this'd be in terms of priorities.

We could probably get away with a very minimal MVP without any additions to the current const generics on stable, although it wouldn't add all the stuff we want. In particular, I guess that the biggest concern is the ability to implement From<uint<N>> for uint<M> where N < M, since we have no idea how those kinds of where clauses will ultimately be implemented. An MVP would not allow such impls, only the enumerated ones we currently have implemented (e.g. u16 > u32).

And then there's still the < 128 bound that seemed very contested in this RFC. Would we want our MVP to allow arbitrary-length integers? Seems like it'd require a lot more work.

@buzmeg
Copy link

buzmeg commented Nov 8, 2021

This RFC was postponed on const generics (although I don't see that as blocking this as I prefer the u48 form over uint<48>). Now that a minimal level of that has hit stable, how does this get reopened?

Prior art: I believe that Ada (general purpose language) and VHDL and SystemC (hardware description languages) also allow arbitrary integer sizes.

@clarfonthey
Copy link
Contributor Author

The main reason for offering a generic form uint<N> is explicitly for generics -- if you're forced to use macros to implement for multiple sizes, then it leaves out a large amount of use cases that allow simplifying integer trait implementations.

That said, there's nothing stopping us from offering uN as an alias for uint<N> if there's desire for it, and there's also nothing stopping people from type-aliasing uint<N> for specific N if we don't.

At this point, I would really love to restart this RFC with the things we've learned since, but I simply have too much going on with work + other stuff at the moment. As I said, if anyone else wanted to take over the role of doing that, I'd be happy to help however I can.

@buzmeg
Copy link

buzmeg commented Nov 11, 2021

Is there somewhere that the "things learned" are documented?

In addition, I don't like coupling "Generic Integers" to "Bitfields" at all. "Bitfields" have a lot of edge cases while a Rust struct that is "packed" is still useful even if you don't necessarily know exactly how it is packed. (ie RGBA--10/10/10/2 can be packed into 4 bytes even if you don't know the exact order while "unpacked" would cost you at least 7 bytes--almost double the size and corresponding problems with cache lines).

While I do want bitfields, I suspect that having generic integers in the language would help people implementing bitfields to explore the space more effectively before converging.

I guess one obvious question would be "What does uint<N> do to the language grammar?"

@workingjubilee
Copy link
Member

I agree that generic integer lengths should not be seen "as-if bitfields", as the rules for C bitfields are subtle and highly implementation dependent. However, Rust users will likely use them as an implementation convenience for Rust types that work "like a bitfield" if introduced, so we should consider their semantics in that regard.

Of course, people already use u8, u16, etc. for that same reason, so that's nothing new or shocking.

@clarfonthey
Copy link
Contributor Author

I mean, there's precedent from Zig for using generic integer types for bitfields, as their packed struct would be equivalent to the #[repr(bitpacked)] I suggested. And we could potentially make it so that #[repr(bitpacked)] allows reordering fields just like #[repr(packed)] does, and then you have to do #[repr(C, bitpacked)] in order to ensure order.

@workingjubilee
Copy link
Member

workingjubilee commented Nov 29, 2021

Sure.

As far as the length-oriented where bound:
It should probably exist in the form of a trait or const fn bound. This would allow changing it after the fact without breaking compatibility. core::simd implements a similar thing as a somewhat ugly hack with a trait implementation on a struct, one we have at least some intention to move to a const fn before we stabilize anything. I mean, we could simply break compilation during monomorphization, but that seems rude.

And I actually think it should be confined to a bound equivalent to uint<N> where N <= 64 at first, because of issues like these:

However, eventually that should rise at least to 128 for the obvious reasons. It would also be nice to be able to go up to 256. This would help simplify implementations of "mask vectors" for AVX512 (currently only requires 16 and 64, but...) and eventually SVE2 (can go up to 256) and RISCV-V, etc. It might even simplify working with oddly-shaped data types like Intel's 80-bit floats.

@clarfonthey
Copy link
Contributor Author

I think that the long-term goal should be to make N virtually unbounded; I say virtually because obviously you're limited by memory and codegen size, and at some point you really should just be using bigints. But yeah, if someone wants to make a 4096-bit integer for RSA keys or something like that, I say they should be allowed. The codegen might kinda suck for integers that big, but I wouldn't say we should stop them.

@programmerjake
Copy link
Member

C23 and C++ are getting generic integer types _BitInt(N) where N is the number of bits:
https://reviews.llvm.org/rG6c75ab5f66b4

@clarfonthey
Copy link
Contributor Author

LLVM getting proper support should make supporting this a lot easier.

@programmerjake
Copy link
Member

iirc llvm already has proper support, at least for and, or, xor, shifts, comparison, add, sub, and multiply. division needs runtime library helpers (I don't remember if just the existing 128-bit helpers are sufficient).

@programmerjake
Copy link
Member

LLVM RFC is up for adding library functions (div/rem) for >128-bit integers:
https://discourse.llvm.org/t/rfc-add-support-for-division-of-large-bitint-builtins-selectiondag-globalisel-clang/60329?u=programmerjake
maybe that would be enough to allow @rust-lang/project-portable-simd to use generic integers for bitmasks once rustc gains support.

@buzmeg
Copy link

buzmeg commented Feb 20, 2022

I would like to note that C23 is standardizing "N2709 - Adding a Fundamental Type for N-bit Integers"

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf

At this point, Rust will actually be lagging C without support for generic integers.

@workingjubilee
Copy link
Member

workingjubilee commented Feb 20, 2022

I think it definitely weighs in strongly on the utility of the type, but we also actually trail behind C and C++ in many other things as well. Adding one more won't hurt.

One thing that came up in idle conversation when discussing the extension to the C Standard with a friend is that in Rust we don't necessarily have a good story for what operations with these types look like... C has the advantage (yes, it is an advantage here!) of accepting certain kinds of numeric promotion, which means that having what are effectively 4096 different integer types is not a problem because they interoperate with each other for simple operations. Are we sure we want that to be the same for Rust?

However, that's actually not something that requires deciding with the implementation, it's just an unresolved question that came up.

@clarfonthey
Copy link
Contributor Author

As evident by the fact that this was postponed instead of closed, there is a pretty strong desire to add these to Rust at some point; it just wasn't on the road map at the time this RFC was done since const generics weren't even close to being done yet, and there's still the unanswered question of numerically constraining where clauses with const generics (i.e., how do we want to implement From<u<N>> for u<M> where N < M).

If people have answers to these questions and the time to write an RFC, I would say go for it. But otherwise, there's not the biggest benefit to posting here since this is still a closed RFC.

I wonder, maybe it would be worth opening up a discussion somewhere in the const generics WG?

@hecatia-elegua
Copy link

After trying to work with several bitfield crates, I stumbled upon this one: https://github.com/danlehmann/bitfield
I then tried to make it more ergonomic, which went ok for a while until it led me to trying:

#[bitfield(u4)]
struct TestChild {
    field1: u2,
    field2: u2,
}
#[bitfield(u8)]
struct TestParent {
    field1: TestChild,
    field2: u4,
}

Of course, this does not work, as proc macros can't "access" types, so TestParent can't access TestChild and therefore doesn't know how big it is and therefore can't generate any getters/setters. I'm not sure if any crate can (or should) breach that barrier?

My current idea is to add a const field SIZE to every bitfield and then generate offsets at runtime. Another idea is to give up on ideals and add an attribute to every struct field specifying it's size.
There are a ton more things to think about and similar issues when implementing bitenums (and bitflags).

To me it feels like we're reimplementing stuff the compiler does, only that the compiler likes to work with bytes and we reimplement things for working with bits, e.g. offset_of and size_of in the case above.

What would a first helpful step be? Maybe allowing the compiler to recognize bit-sized types?

@workingjubilee
Copy link
Member

Adding the notion of a "bit-sized" type (as opposed to a bit type that happens to occupy an entire u8) is much more troublesome than you might think, because one thing that Rust code can usually count on is that for each T in (T, T), the types can have pointers taken to them and interacted with separately. If you want to bitpack four u2 into one u8, that goes out the window.

@programmerjake
Copy link
Member

Adding the notion of a "bit-sized" type (as opposed to a bit type that happens to occupy an entire u8) is much more troublesome than you might think, because one thing that Rust code can usually count on is that for each T in (T, T), the types can have pointers taken to them and interacted with separately. If you want to bitpack four u2 into one u8, that goes out the window.

imho bit-sized types would always be padded out to an integer number of bytes, except for inside bit-packed structs/enums, which are like repr(packed) in that you can't make references to their fields, you can only read or set them.

@nyabinary
Copy link

Has enough time passed to revisit this yet? :P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const-generics Const generics related proposals. A-primitive Primitive types related proposals & ideas finished-final-comment-period The final comment period is finished for this RFC. postponed RFCs that have been postponed and may be revisited at a later time. T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.