Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any plan to implement signed types i2 .. i127 #14

Open
apps4uco opened this issue Apr 29, 2023 · 28 comments
Open

Is there any plan to implement signed types i2 .. i127 #14

apps4uco opened this issue Apr 29, 2023 · 28 comments

Comments

@apps4uco
Copy link

It would be useful to implement signed types i2 .. i127

@danlehmann
Copy link
Owner

Thanks for the bug report. I'm planning on doing them eventually, though I haven't had the need yet.

There are also some non-trivial design questions. For example, should i4::new(-1).raw_value() return -1i8 or 0b1111? The former would be more useful for arithmetics where-as the second would be more useful to use in bitfields.

Maybe the answer is two functions? raw_value() could return 0b1111 and .value() (or .into()) would return -1i8.

@hecatia-elegua
Copy link
Collaborator

I would go for two functions too (but the inner value will still need to be one of both, sounds like something we could maybe benchmark or just use the easier one).
Hm, though -1i8 as u8 == 0b11111111... u4::from(0b1111)... honestly would need some prototyping together with bitfields

@danlehmann
Copy link
Owner

It would be nice if .raw_value() would just be the bits, without sign-extension. That way bitfields can rely on not having to apply mask. For the inner value of the arbitrary-int, we'd want the sign extended value though probably. Which means raw_value() wouldn't be a pure getter bit instead a get-with-mask.

@danlehmann
Copy link
Owner

Not sure if benchmarks would help much, as it depends on your use-case:

  • If you only use i5 as a field in a bitfield, not sign-extending the inner value would be faster.
  • If you do arithmetic, sign-extending is better

Most likely, the difference will be incredibly tiny anyway

@hecatia-elegua
Copy link
Collaborator

You're right, benchmark was the wrong word, the easier one should be used

@vortexofdoom
Copy link

I was thinking of trying to implement this. My gut instinct would be to "optimize" for arithmetic operations, since if the only use case is as a bitfield you can probably make it work with an unsigned type if the tiny difference is actually important? Adding masking overhead (however little) to arithmetic operations on a signed type seems like it's contrary to the point of using one, and would be a purely ergonomic change.

@danlehmann
Copy link
Owner

danlehmann commented May 22, 2023

Awesome, happy to take a patch if it fits within the overall design. Given the previous discussions I'd suggest the following:

  • The inner value should be sign-extended (so e.g. for a i5::new(-2) value would be -2u8)
  • Public function signed_value() returns this inner value
  • Public function value() returns the masked inner value

By the way: The easiest way I can think of to mask (while retaining the sign) is to shift like this:

For example, the addition of two i5 would look something like this:

fn add(v1: i5, v2: i5) -> i5 {
  u5 { value: ((v1.value + v2.value) << 3) >> 3 }
}

@danlehmann
Copy link
Owner

Hmm there is one more complication: Do we want new() to be i5::new(-2) or i5::new(0b11110)?

Reason I'm asking: For unsigned ints, new() is doing a range check. We want to have this for signed ints as well, I'd imagine.

So maybe we should make this symmetrical? i5::new_signed(-5i8) vs i5::new(0b11110).

To make it less error prone, we could make new() accept an unsigned value and have value() return an unsigned value.

@danlehmann
Copy link
Owner

To be honest, I still don't quite like it. Seems unbalanced. What do you two think?

@danlehmann
Copy link
Owner

I took at the look at the uX crate and they don't seem to have a distinction at all, based on the documentation.

@hecatia-elegua
Copy link
Collaborator

Let's think about the basic case if rust would support arbints.
i4::new(-1) would be -1i4
.value() = -1 would be -1
At least that's what I imagined these functions to do.

@vortexofdoom
Copy link

I was thinking about the possibility of expanding signed to be constructed from unsigned as well, but I don't know how sold I am on it.

One problem I noticed before I got too far is the Number trait is confined to implementing unsized types, and the corresponding macros require it. The options are making separate macros for a separate trait or modularizing it some other way.

I was thinking this should probably be behind the num-traits feature, since that has some more general integer traits that might translate across better?

@vortexofdoom
Copy link

Let's think about the basic case if rust would support arbints. i4::new(-1) would be -1i4 .value() = -1 would be -1 At least that's what I imagined these functions to do.

I think the issue is do we represent -1i4 under the hood as 0b00001111 (15i8) and maybe overload arithmetic ops, incurring a small amount of overhead for that, or as 0b11111111 (-1i8), incurring a small amount of overhead when we need to keep it constrained to 4 bits of data. And I'm inclined to say the latter.

@vortexofdoom
Copy link

vortexofdoom commented May 22, 2023

Hmm there is one more complication: Do we want new() to be i5::new(-2) or i5::new(0b11110)?

I'm imagining i5::new(-2) and i5::from_bits(0b11110). I think letting users keep a consistent conception of these as actual integral values is pretty important, so having a constructor from signed numbers is necessary imo.

in other words, I think new should be for the underlying type in all cases, unless we were going to keep using unsigned integers under the hood, which I guess is an option? But I think the library should do the implicit conversions mathematically even in that case I think, it should be possible to use them without constantly deriving the twos complement representation yourself.

@danlehmann
Copy link
Owner

The API question can be independent from the underlying representation. We might decide to change between signed/unsigned later, but the behavior of the library should be the same.

I agree that u5::new(-3) and value() should return -3.

That means we should another pair of functions for the unsigned representation. from_bits() and to_bits() probably makes sense - it mirrors the functions in f32/f64. Those can be the masked values.

@vortexofdoom
Copy link

So I was thinking about the ways to go about including all of the implementations, and one thing to consider is that the Number trait (and the crate in general) is very coupled with unsigned integers. This isn't a problem in and of itself, per se, but has to be worked around in one of a few ways:

  1. Just define a new trait like SignedNumber that is From<i8> + TryFrom<i16> etc. I don't really like this approach, it's a huge amount of duplication and just more code to maintain for largely the same logic.
  2. Remove the trait bounds from Number and then do a generic implementation of Number for T where T: From<[appropriate signed int type here]>.
  3. An intriguing idea would be to have every arbitrary int type backed by a signed int. The only bit widths of unsigned int where it matters whether its representation is signed are the actual primitive types, which the crate obviously doesn't provide. Any number of bits lower than the native type's will be guaranteed to have the same value, and overflows should work exactly the same. This could obviously be a breaking change, although it might(?) actually be possible to keep the same API, and for most use cases I think most people will be using the aliased type directly, rather than accessing the underlying type other than as raw bits, which should still be supported as an unsigned int with from_bits() and to_bits().

The third option would essentially represent every arbitrary int as a sign-extended byte aligned type, and the bounds checking would work largely similarly I think? Option 2 is almost certainly the sanest approach, but I think there's a certain elegance to 3.

@hecatia-elegua
Copy link
Collaborator

I can just comment two more thoughts:

  • does transmute work for any of this / should it work?
  • I like @vortexofdoom 2.

@vortexofdoom
Copy link

I mean, I think transmute is not the proper answer, it's too heavyweight, I think we could probably get away with a #[allow(overflowing_literals)] on exactly the functions that are moving between types of the same size.

@hecatia-elegua
Copy link
Collaborator

To be clearer: if we did 3., wouldn't that mean core::mem::transmute would fail?

@vortexofdoom
Copy link

Nothing makes me think it should lead to UB naively, transmute still requires that the items have the same size, and that's the one guarantee we have. For any bit configuration smaller than the full size, the bit representation should be identical for signed and unsigned integers. I could be wrong though.

@danlehmann
Copy link
Owner

I'd prefer to not use transmute as it's unsafe. The safe alternative is simply to do a cast here: -1i8 as u8 == 255u8. Casting between signed and unsigned of the same size won't touch the bits and there's no possibility for overflow either.

@danlehmann
Copy link
Owner

danlehmann commented May 23, 2023

As to @vortexofdoom 's suggestions:

third: could be done in a way that's not a breaking change, but I don't see how it would help. We can change the backing store (and mostly things would be identical), but we might have to change some operations like Add to restore the previous behavior.

second: Won't work unfortunately as Rust's generics don't support const. So once we introduce something like T: From, we lose all const-ness :(

first: I think this is the way to go unfortunately. One silver lining here is that there are currently two Number implementations (one using const_trait and one using stable). As Rust itself has dropped const_trait, we don't need to have the const-version for the SignedNumber. I'll file a separate task to remove that.

@vortexofdoom
Copy link

vortexofdoom commented May 23, 2023

Yeah, to put what I was trying to say another way, iN primitive types correctly represent ANY unsigned type with N-1 bits or fewer, so having fewer trait implementations, generic structs, etc. generated as a result of adding signed integers would be the big win for that idea, but it's just an idea and probably not a big deal.

@hecatia-elegua
Copy link
Collaborator

As Rust itself has dropped const_trait, we don't need to have the const-version for the SignedNumber. I'll file a separate task to remove that.

You know I'm using this functionality and probably also know that they've just temporarily removed it. Please don't.

@danlehmann
Copy link
Owner

Filed #22 . Let's discuss the const-trait over there?

@danlehmann
Copy link
Owner

danlehmann commented Jun 24, 2023

I wanted to make some more changes to Number anyway. For example, Number can easily require Add, Sub, Shl etc to be implemented. The downside of that though is that it would be a slightly breaking change (in the unlikely case that someone else implements Number). So it might be a good idea to bundle API changes together and make a 2.0.

@vortexofdoom
Copy link

Sorry, I got busy with other things but have recently been thinking about getting back to this.

One thing that originally slowed me down is that lib.rs is pretty unwieldy already, and having alternating signed/unsigned impls or two big blocks both felt pretty bad for both readability and maintainability. It feels weird to have a file just for signed types, so I'd probably break out both types into their own files, and maybe only keep the Number trait(s?) in lib.rs.

Speaking of Number, I think it would be a good idea to make it sealed if we're implementing all types from from 1 to 128 bits. There's not really a use case for users to implement it on external types, and that ability is probably not something we want to guarantee long term. Probably something to be filed in a separate issue, but it might add some implementation flexibility here.

In case someone picks this up before I do, one thought I had is that a major benefit to implementing with the underlying type being signed is that right shifts are sign-extending. This would make it pretty easy to cast between the unsigned "bits" and signed internal formats by first left shifting until the sign bit lines up with the underlying primitive and then switching to the other and right shifting again.

@dzmitry-lahoda
Copy link
Contributor

dzmitry-lahoda commented Jul 2, 2024

zig becomes more and more popular, so zigs defaults may be good to go (hoping rust will also adopt these in future). i mean storage vs arith for negative numbers. storage also for u nums is relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants