Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support unsafe float-to-int casts #1264

Closed
RalfJung opened this issue Mar 26, 2020 · 25 comments · Fixed by #1325
Closed

Support unsafe float-to-int casts #1264

RalfJung opened this issue Mar 26, 2020 · 25 comments · Fixed by #1325
Labels
A-shims Area: This affects the external function shims C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement

Comments

@RalfJung
Copy link
Member

RalfJung commented Mar 26, 2020

With rust-lang/rust#66841, we have some unsafe intrinsics to float-to-int casts. We should implement those in Miri, with appropriate UB checking of course. So we need to figure out how to check, with apfloat, if the cast would go wrong.

We also should check that the float-to-int "as" casts in Miri match the planned safe behavior in Rust. I am not sure what those plans currently are, probably whatever -Z saturating-float-casts does. @eddyb the from_i128/from_u128 methods in apfloat, are they saturating? (rust-lang/rust#70437, #1265)

@RalfJung RalfJung added A-shims Area: This affects the external function shims C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement labels Mar 26, 2020
@RalfJung
Copy link
Member Author

The docs for the apfloat cast method says

In case of an invalid operation exception, deterministic values are returned, namely zero for NaNs and the minimal or maximal value respectively for underflow or overflow.

So, it seems like our "as" implementation matches -Z saturating-float-casts?

@eddyb
Copy link
Member

eddyb commented Mar 26, 2020

@hanna-kruppe is the expert here, but that sounds right to me.

@hanna-kruppe
Copy link

hanna-kruppe commented Mar 26, 2020

Yes, the behavior in that quote matches the runtime behavior with -Z saturating-float-casts, assuming underflow means "negative value outside the range of destination type" (like overflow, just with opposite sign) rather than the condition called underflow in IEEE 754 (magnitude too close to zero, regardless of sign).

@RalfJung
Copy link
Member Author

assuming underflow means "negative value outside the range of destination type" (like overflow, just with opposite sign) rather than the condition called underflow in IEEE 754 (magnitude too close to zero, regardless of sign)

Hm... sounds like a reasonable interpretation, but it's a bit worrying that terminology doesn't match up.

@RalfJung RalfJung changed the title Support/adjust float-to-int casts Support unsafe float-to-int casts Mar 26, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Mar 26, 2020
Miri float->int casts: be explicit that this is saturating

r? @hanna-kruppe
Cc rust-lang/miri#1264
@RalfJung
Copy link
Member Author

RalfJung commented Mar 28, 2020

Okay, with #1265 it seems like our cast story is correct (except for the NaN issue tracked at rust-lang/rust#69532). I confirmed locally that this matches behavior of rustc with -Zmir-opt-level=0 -Zsaturating-float-casts.

What remains is the unsafe intrinsic, where the hard part is the UB checking. How do we detect if casting the float to the target int type overflowed or not?
We could rather crudely compare the result to the max/min value of the target type, but I assume that's wrong -- we could hit that value regularly, not as result of saturation.

The safety docs for these methods say the input must "be representable in the return type Int, after truncating off its fractional part". Doesn't that mean that if u64::MAX as f64 rounds up, the result may not be used with these methods? (It seems to round down, but that might not be the case for all possible casts.)
Cc @SimonSapin

We could also compare the input with u64::MAX as f64 (if we are currently doing f64 -> u64 casting), and reject it if it is bigger. But I assume that is incorrect, because u64::MAX as f64 is 18446744073709551615, and 18446744073709551616 is still smaller than u64::MAX and thus a legal input. OTOH, that number is not representable I guess, otherwise the cast would not have rounded down quite so far... but for smaller types this certainly doesn't work; 255.5f64 as u8 is legal according to what the docs say, but it is bigger than u8::MAX as f64.

@hanna-kruppe
Copy link

hanna-kruppe commented Mar 28, 2020

I believe the simplest way (for miri) is:

  1. Perform the "drop the fractional bits" part first. This should be rustc_apfloat's round_to_integral with rounding to zero.
  2. Cast the truncated float to u128 and compare that against the target type's range. You'll still need to check that the float value wasn't out of range for u128, but since the fractional bits were already dropped in the first step, the is_exact output parameter will tell you if the "integer part" fit into u128.

The conversion to u128 has no equivalent for signed types but waves hand splitting the float into (sign, absolute value) should solve that. Should not be a conceptual challenge, this kind of trick just often leads to extra bookkeeping that can be annoying to spell out (which is why I hand-wave it here).

@hanna-kruppe
Copy link

Cast the truncated float to u128 and compare that against the target type's range. You'll still need to check that the float value wasn't out of range for u128, but since the fractional bits were already dropped in the first step, the is_exact output parameter will tell you if the "integer part" fit into u128.

Actually, just casting to the right bit width to begin with should work just fine (for unsigned types; again signed types need extra care). The cast needs to perform the integer range check anyway, and for reasons described above is_exact should tell you precisely if it was out of range.

@RalfJung
Copy link
Member Author

For signed casts, why would to_i128 not work just as well as to_u128 works for unsigned?

For is_exact, there actually seem to be two ways to determine this? There's the &mut bool: is_exact in the _r variants of the cast methods, but the returned Status also has an INEXACT flag.

@RalfJung
Copy link
Member Author

Cast the truncated float to u128

So there will never be a problem where truncation to an integer leads to more rounding loss, or so? The truncation operation is always exactly representable?

@hanna-kruppe
Copy link

hanna-kruppe commented Mar 28, 2020

For signed casts, why would to_i128 not work just as well as to_u128 works for unsigned?

Oops, I did not see those on a glance because they're provided methods. Yes, either to_i128 or to_i128_r should work depending on whether the is_exact / INEXACT distinction matters.

For is_exact, there actually seem to be two ways to determine this? There's the &mut bool: is_exact in the _r variants of the cast methods, but the returned Status also has an INEXACT flag.

According to the code, the only difference should be that casting -0.0 to an integer returns Status::OK but sets is_exact = false. Is negative zero UB with this intrinsic or does it map to integer zero?

So there will never be a problem where truncation to an integer leads to more rounding loss, or so? The truncation operation is always exactly representable?

Right, this operation is exact. Conceptually, it just clears any bits of the significant that have "weight" smaller than 1. It's not quite so simple on the level of bit strings (because of the implicit leading 1 in normal numbers, the exponent sometimes has to change), but still you'll never get in the situation where you have to round off non-zero digits.

@eddyb
Copy link
Member

eddyb commented Mar 28, 2020

Looking at the implementation, I'd say INEXACT is for losing fraction bits and is actually safe to ignore, it's rather any other flag (i.e. INVALID_OP) that might be considered UB:
https://github.com/rust-lang/rust/blob/c52cee172fcd2e223100d8bdd5e105dc37aaca23/src/librustc_apfloat/ieee.rs#L1239-L1241
https://github.com/rust-lang/rust/blob/c52cee172fcd2e223100d8bdd5e105dc37aaca23/src/librustc_apfloat/ieee.rs#L1288-L1291
https://github.com/rust-lang/rust/blob/c52cee172fcd2e223100d8bdd5e105dc37aaca23/src/librustc_apfloat/ieee.rs#L1295-L1298

(argh GitHub, this is what we get for using separate repos, I guess 😠)

@SimonSapin
Copy link

The safety docs for these methods say the input must "be representable in the return type Int, after truncating off its fractional part". Doesn't that mean that if u64::MAX as f64 rounds up, the result may not be used with these methods?

As far as I can tell, that’s correct.

(It seems to round down, but that might not be the case for all possible casts.)

Uh, it seems to round up. The output for me is:

18446744073709551615
18446744073709552000

It’s more obvious in hexadecimal, and when also finding the previous representable f64:

fn main() {
    println!("{:20x}", u64::MAX);
    println!("{:20x}", u64::MAX as f64 as u128);
    println!(
        "{:20x}",
        (0..u64::MAX)
            .rev()
            .find(|&n| n as f64 != u64::MAX as f64)
            .unwrap()
    );
}

(Playground)

Output:

    ffffffffffffffff
   10000000000000000
    fffffffffffffbff

https://doc.rust-lang.org/nightly/reference/expressions/operator-expr.html#type-cast-expressions documents:

Casting from an integer to float will produce the closest possible float *

if necessary, rounding is according to roundTiesToEven mode ***

*** as defined in IEEE 754-2008 §4.3.1: pick the nearest floating point number, preferring the one with an even least significant digit if exactly halfway between two floating point numbers.

Rounding up causes an error of 1, but rounding down would cause an error of 1024 so the former is closest.

@SimonSapin
Copy link

As to the behavior of {f32,f64}::to_int_unchecked, it’s basically defined as same as LLVM’s fptoui and fptosi instructions. Their docs include “if the value fits in [the target type]” but unfortunately are not more precise than that. However these instructions are presumably designed for clang to implement C and C++ casts, and we find:

C11 standard https://port70.net/~nsz/c/c11/n1570.html#6.3.1.4

When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.

C++17 standard http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4659.pdf#section.7.10

A prvalue of a floating-point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type.

It looks like truncation towards zero matches the existing f32::trunc and f64::trunc methods, so that part is easy enough.

Correctly implementing a check that a (truncated) float value can be represented can be tricky however, especially for large integer types. I’d recommend looking at the implementation of -Z saturating-float-casts:

https://github.com/rust-lang/rust/blob/150322f86d441752874a8bed603d71119f190b8b/src/librustc_codegen_ssa/mir/rvalue.rs#L775-L892

@RalfJung
Copy link
Member Author

RalfJung commented Mar 29, 2020

@eddyb

Looking at the implementation, I'd say INEXACT is for losing fraction bits and is actually safe to ignore, it's rather any other flag (i.e. INVALID_OP) that might be considered UB

Oh so by looking at these flags I can actually skip the first truncation step? Or, maybe playing it safer, I could truncate and then make sure there are no flags at all?

However, when we are casting to u128, how can this line possibly detect overflow?
https://github.com/rust-lang/rust/blob/c52cee172fcd2e223100d8bdd5e105dc37aaca23/src/librustc_apfloat/ieee.rs#L1296
r > overflow is just always false when overflow is u128::MAX.

@SimonSapin

Uh, it seems to round up.

Ouch, clearly I cannot read. Thanks for correcting me.
So, (u64::MAX as f64).approx_unchecked_to::<u64>() is UB. Interesting.

presumably

Yeah, that's the usual problem with the LLVM docs, you have to do a lot of guesswork if you want to know the UB conditions. :/ That's why I went for your Rust-side docs instead. Thanks for giving the sources for those!

Correctly implementing a check that a (truncated) float value can be represented can be tricky however, especially for large integer types.

That's what I was worried about, but apfloat's to_[iu]128_r methods have an is_exact output parameter that should tell me if any further rounding happened when turning the truncated part into an integer of appropriate size.

Open questions:

  • Is casting NaN UB? I would assume so.
  • Is casting negative 0 UB? From your C/C++ quotes, I would assume not, but apfloat sets is_exact to false for negative 0, so we have to be careful here.
  • Are there any other "magic" float values that I missed? The infinities quite clearly are too big/small for the target type, and hence UB, I would argue.

I’d recommend looking at the implementation of -Z saturating-float-casts:

I don't think that actually helps for UB detection: that code saturates when the float exceeds Int::MAX as Float. For example, the saturation branch is entered when doing 255.5f64 as u8, as 255.5 is bigger than 255. And the result for that is correct (255u8), but this is not a sufficient condition for have UB.
Am I missing something?

@eddyb
Copy link
Member

eddyb commented Mar 29, 2020

@RalfJung That part is handled by https://github.com/rust-lang/rust/blob/c52cee172fcd2e223100d8bdd5e105dc37aaca23/src/librustc_apfloat/ieee.rs#L1264-L1267.

@hanna-kruppe
Copy link

hanna-kruppe commented Mar 29, 2020

@eddyb

Looking at the implementation, I'd say INEXACT is for losing fraction bits and is actually safe to ignore, it's rather any other flag (i.e. INVALID_OP) that might be considered UB:

After looking over it for a few minutes, this claim seems plausible but I'm not 100% sure about it. To me, removing fractional bits first seems easier to have confidence in, and performance is not really a concern here. There's also the open question of -0.0.

@RalfJung

I’d recommend looking at the implementation of -Z saturating-float-casts:

I don't think that actually helps for UB detection: that code saturates when the float exceeds Int::MAX as Float. For example, the saturation branch is entered when doing 255.5f64 as u8, as 255.5 is bigger than 255. And the result for that is correct (255u8), but this is not a sufficient condition for have UB.
Am I missing something?

No, you're right. The [f_min, f_max] range computed there is an under-approximation of the range where fpto[su]i is defined. The precise range where the unchecked conversion is well-defined could be computed similarly with more effort, but the whole approach is rather fiddly and only chosen to minimize performance overhead by avoiding soft-float computations at runtime. For miri there's no such constraint (all FP math is soft-float anyway), so explicitly computing the safe range is unnecessary in that context.

@SimonSapin
Copy link

Sorry I don’t have a precise source for each of these but my understanding/recollection is:

  • Is casting NaN UB? I would assume so.

Yes.

  • Is casting negative 0 UB?

No. The integral part of negative zero can be represented as integer zero.

  • Are there any other "magic" float values that I missed?

No.

The infinities quite clearly are too big/small for the target type, and hence UB, I would argue.

Right, in an algorithm based on comparing they’ll behave the same as out-of-range finite values.


For example, the saturation branch is entered when doing 255.5f64 as u8,

Oh you’re right, sorry. Scratch that recommendation :)

@RalfJung
Copy link
Member Author

There are a bunch of testcases for this at https://github.com/WebAssembly/testsuite/blob/master/conversions.wast (WebAssembly semantics seems to match ours).

@RalfJung
Copy link
Member Author

Hm, some of these use hexadecimal floating-point notation, such as 0x1p-149, which Rust cannot parse it seems. :/

@RalfJung
Copy link
Member Author

RalfJung commented Apr 11, 2020

Ah, I can convert them using godbolt ;) (and here's the f64 variant

@hanna-kruppe
Copy link

You could also use https://crates.io/crates/hexf

@eddyb
Copy link
Member

eddyb commented Apr 11, 2020

Hm, some of these use hexadecimal floating-point notation, such as 0x1p-149, which Rust cannot parse it seems. :/

rustc_apfloat has hexfloat support, let me know if it doesn't work as expected.

@RalfJung
Copy link
Member Author

Well that doesn't help, I need to input them in the .rs file. And I can't really add external dependencies there, so for now I just converted them to bit patterns externally and then used from_bits. See #1321.

@eddyb
Copy link
Member

eddyb commented Apr 11, 2020

Ah you're not writing an unit test, my bad.

@RalfJung
Copy link
Member Author

I implemented the unchecked intrinsic in #1325. Reviews would be welcome, in particular for the testcases. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-shims Area: This affects the external function shims C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants