Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow floating-point operations to provide extra precision than specified, as an optimization #2686

Conversation

joshtriplett
Copy link
Member

@joshtriplett joshtriplett commented Apr 17, 2019

Rendered

This enables optimizations such as fused multiply-add operations by default, while providing robust mechanisms to disable extra precision for applications that wish to do so.

EDIT: Please note that this RFC has been substantially overhauled to better accommodate applications that wish to disable extra precision. In particular, there's a top-level codegen option (-C extra-fp-precision=off) to disable this program-wide.

…fied, as an optimization

This enables optimizations such as fused multiply-add operations by
default.
@joshtriplett
Copy link
Member Author

cc @fenrus75

@joshtriplett joshtriplett added the T-lang Relevant to the language team, which will review and decide on the RFC. label Apr 17, 2019
Copy link

@hanna-kruppe hanna-kruppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really want Rust to have a good story for licensing floating point optimizations, including but not limited to contraction. However, simply turning on contraction by default is not a good step in that direction. Contrary to what the RFC claims, contraction is not "safe" (meaning that it breaks otherwise-working programs; obviously there's no memory safety at stake), and we have not previously reserved the right to do this or given any other indication to users that it might happen.

Let's design a way to opt into and out of this behavior at crate/module/function first, and once that's done we can look at how to make more code use it automatically. A fine-grained opt-in and -out is very useful even if we end up changing the default e.g., to ensure code that breaks under contraction can be compiled as part of a crate graph that generally has contraction enabled. There's plenty of design work to keep us busy even without touching defaults:

  • compiler options or attributes or ...?
  • how does it propagate from callers into callees, if at all? (generally hard problem, but IMO a good story for this is just as valuable as providing the basic feature in the first place)
  • what transformations are licensed exactly? (e.g., do we want roughly what the C standard allows, or do we want more like GCC does?)

back to a lower-precision format.

In general, providing more precision than required should not cause a
mathematical algorithm to fail or to lose numeric accuracy.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. One simple counter-example is x * x - y * y, which is non-negative for all x and y whose squares are finite floats, but if the expression is contracted to x.mul_add(x, - y * y) then it can have negative results. This can of course snowball into even worse issues downstream, e.g., if this is fed into sqrt() to get the 2D euclidean norm, contraction can cause you to end up with NaNs on perfectly innocuous vectors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any programs that have a problem with that will need to pass non-default compiler options on many common C, C++, and Fortran compilers.

That said, I'll adjust the language.

Copy link
Contributor

@gnzlbg gnzlbg Apr 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any programs that have a problem with that will need to pass non-default compiler options on many common C, C++, and Fortran compilers.

Some C, C++, and Fortran compilers do this (gcc, msvc), some don't (clang). If this were an universally good idea, all of them would do this, but this is not the case. That is, those languages are prior art, but I'm really missing from the prior art section why this would actually be a good idea - are programmers using those languages happy with that "feature" ?

A sign change trickling down your application depending on the optimization level (or even debug-information level) can be extremely hard to debug in practice. So IMO this issue raised by @rkruppe deserves more analysis than a language adjustment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this would actually be a good idea
are programmers using those languages happy with that "feature"

The beginning of the RFC already makes the rationale quite clear: this allows for optimizations on the scale of 2x performance improvements, while never reducing the accuracy of a calculation compared to the mathematically accurate result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkruppe Looking again at your example, I think there's something missing from it? You said:

One simple counter-example is x * x - y * y, which is non-negative for all x and y whose squares are finite floats

Counter-example to that: x = 2.0, y = 4.0. Both x and y square to finite floats, and x*x - y*y should absolutely be negative. I don't think those properties alone are enough to reasonably expect that you can call sqrt on that and get a non-imaginary result.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, sorry, you're right. That's what I get for repeating the argument from memory and filling the gaps without thinking too long. In general of course x² may be smaller than y². The problematic case is only when x = y (+ aforementioned side conditions), in that case (x * x) - (y * y) is zero but with FMA it can be negative.

Another example, I am told, is complex multiplication when multiplying a number by its conjugate. I will not elaborate because apparently I cannot be trusted this late in the evening to work out the details correctly.

Copy link

@fenrus75 fenrus75 Apr 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. One simple counter-example is x * x - y * y, which is non-negative for all x and y whose squares are finite floats, but if the expression is contracted to x.mul_add(x, - y * y) then it can have negative results. This can of course snowball into even worse issues downstream, e.g., if this is fed into sqrt() to get the 2D euclidean norm, contraction can cause you to end up with NaNs on perfectly innocuous vectors.

I suspect this is not a valid statement.

the original is in pseudocode

round64( round64(x * x) - round64(y * y) )

the contraction you gives

round64( x * x - round64(y * y) )

the case for this to go negative only in the contraction case would require the
round64(x * x) to round up to >= round64(y * y) while x * x itself is < round64(y * y),
so round64(x * x) == round64(y * y) by the "nearest" element of rounding; it can't cross round64(y * y)

since we're rounding to nearest, it means that x * x is equal to less than half a unit of precision away from round64(y * y).
this in turn means that x * x - round64(y * y) is, while negative in this case, less than half a unit of precision way from 0, which means the outer round64() will round up to 0.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. One simple counter-example is x * x - y * y, which is non-negative for all x and y whose squares are finite floats, but if the expression is contracted to x.mul_add(x, - y * y) then it can have negative results. This can of course snowball into even worse issues downstream, e.g., if this is fed into sqrt() to get the 2D euclidean norm, contraction can cause you to end up with NaNs on perfectly innocuous vectors.

I suspect this is not a valid statement.

the original is in pseudocode

round64( round64(x * x) - round64(y * y) )

the contraction you gives

round64( x * x - round64(y * y) )

If you use y=x, then if round64(x*x) rounds up, it's easy to see that round64(x*x - round64(x*x)) is negative. This does not round to zero, because units of precision are not absolute, but relative (think significant figures in scientific notation).

For reference (and more interesting floating point information!) see the "fmadd" section on https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the conclusion, if I read this correctly, is that indeed increasing precision locally in some sub-computations can reduce precision of the overall computation, right? (Also see here.)

across platforms, this change could potentially allow floating-point
computations to differ by platform (though never below the standards-required
accuracy). However, standards-compliant implementations of math functions on
floating-point values may already vary slightly by platform, sufficiently so to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm the last person to argue we have any sort of bit-for-bit reproducibility of floating point calculations across platforms or even optimization levels (I know in regretable detail many of the reasons why not), but it seems like a notable further step further to make even the basic arithmetic operations dependent on the optimization level, even for normal inputs, even on the (numerous) targets where they are currently not.

- [C11](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) allows
this with the `STDC FP_CONTRACT` pragma enabled, and the default state
of that pragma is implementation-defined. GCC enables this pragma by
default, [as does the Microsoft C

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that GCC defaults to -ffp-contract=fast, which goes beyond what's described in the C standard, and according to documentation the only other option it implements is off.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on some careful research, as far as I can tell GCC's -ffp-contract=fast just changes the default value of STDC FP_CONTRACT, nothing else. It does not enable any of the potentially accuracy-reducing "fast-math" optimizations.

(-ffp-contract=off means "ignore the pragma", and -ffp-contract=on means "don't ignore the pragma" but doesn't change the default.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is: the C standard only allows FMA synthesis within a source-level expression. This is extremely inconvenient to respect at the IR level (you'd have to track which source level expression each operation comes from), so -ffp-contract=fast simply disregards source level information and just contracts IR operations if they're of the suitable form.

Clang implements this option too, but it defaults to standard compliance by performing contraction in the frontend where source level boundaries are still available.

expression", where "Two arithmetic expressions are mathematically
equivalent if, for all possible values of their primaries, their
mathematical values are equal. However, mathematically equivalent
arithmetic expressions may produce different computational results."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with Fortran (or at least this aspect of it), but this quote seems to license far more than contraction, e.g. all sorts of -ffast-math style transformation that ignore the existence of NaNs. Is that right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkruppe That's correct, Fortran also allows things like reassociation and commutation, as long as you never ignore parentheses.

@Centril Centril added A-arithmetic Arithmetic related proposals & ideas A-attributes Proposals relating to attributes A-flags Proposals relating to rustc flags or flags for other tools. A-primitive Primitive types related proposals & ideas labels Apr 17, 2019
@joshtriplett
Copy link
Member Author

@rkruppe wrote:

I really want Rust to have a good story for licensing floating point optimizations, including but not limited to contraction. However, simply turning on contraction by default is not a good step in that direction.

It'd be a step towards parity with other languages, rather than intentionally being slower. I think we need to seriously evaluate whether we're buying anything by intentionally being slower. (And by "slower" here, I don't mean a few percent, I mean 2x slower.)

Contrary to what the RFC claims, contraction is not "safe" (meaning that it breaks otherwise-working programs; obviously there's no memory safety at stake),

Any such programs would be broken in C, C++, Fortran, and likely other languages by default; they'd have to explicitly disable the default behavior. Such programs are also going directly against best practices in numerical methods; if anything, we should ideally be linting against code like (x*x - y*y).sqrt().

and we have not previously reserved the right to do this or given any other indication to users that it might happen.

I've also found no explicit indications that we can't do this. And I've seen no indications that people expect Rust's default behavior to be different than the default behavior of other languages in this regard. What concrete problem are we trying to solve that outweighs a 2x performance win?

A fine-grained opt-in and -out is very useful even if we end up changing the default

Agreed. The RFC already proposes an attribute; I could expand that to provide an attribute with two possible values.

There's plenty of design work to keep us busy even without touching defaults:

If we have any hope of changing the defaults, the time to do that would be before those defaults are relied on.

compiler options or attributes or ...?

I think it makes sense to have a global compiler codegen option, and I also think it makes sense to have an attribute (with a yes/no) that can be applied to any amount of code.

how does it propagate from callers into callees, if at all? (generally hard problem, but IMO a good story for this is just as valuable as providing the basic feature in the first place)

The attribute shouldn't. It should only affect code generation under the scope of the attribute.

what transformations are licensed exactly? (e.g., do we want roughly what the C standard allows, or do we want more like GCC does?)

My ideal goal would be "anything that strictly increases accuracy, making the result closer to the mathematically accurate answer". That would also include, for instance, doing f32 math in f64 registers and not forcing the result to f32 after each operation, if that'd be faster.

@ExpHP
Copy link

ExpHP commented Apr 18, 2019

Such programs are also going directly against best practices in numerical methods; if anything, we should ideally be linting against code like (x*x - y*y).sqrt().

In favor of what?

@fenrus75
Copy link

fenrus75 commented Apr 18, 2019 via email

@joshtriplett
Copy link
Member Author

joshtriplett commented Apr 18, 2019 via email

Currently, Rust's [specification for floating-point
types](https://doc.rust-lang.org/reference/types/numeric.html#floating-point-types)
states that:
> The IEEE 754-2008 "binary32" and "binary64" floating-point types are f32 and f64, respectively.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall this be understood as "the layout of f{32, 64} is that of binary{32, 64}" or as "the layout and arithmetic of f{32, 64} is that of binary{32, 64}" ?

The IEEE-754:2008 standard is very clear that optimizations like replacing a * b + c with fusedMultiplyAdd(a, b, c) should be opt-in, and not opt-out (e.g. see section 10.4), so depending on how one interprets the above, the proposed change could be a backwards incompatible change.

computations to differ by platform (though never below the standards-required
accuracy). However, standards-compliant implementations of math functions on
floating-point values may already vary slightly by platform, sufficiently so to
produce different binary results. This proposal can never make results *less*
Copy link
Contributor

@gnzlbg gnzlbg Apr 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intention of the user was for its Rust programs to actually have the semantics of the code it actually wrote, e.g., first do a a * b, and then add the result to c, performing intermediate rounding according the precision of the type, this proposal does not only make the result less accurate, but it makes it impossible to actually even express that operation in the Rust language.

If the user wants higher precision they can write fma(a, b, c) today, and if the user does not care, they can write fmul_add(a, b, c). This proposal, as presented, does not provide a first_mul_a_b_then_add_c(a, b, c) intrinsic that replaces the current semantics, so the current semantics become impossible to write.

Copy link
Member Author

@joshtriplett joshtriplett Apr 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

performing intermediate rounding, according the precision of the type

What we're discussing in this RFC is, precisely, 1) whether that's actually the definition of the Rust language, and 2) whether it should be. Meanwhile, I'm not seeing any indication that that's actually the behavior Rust developers expect to get, or that they expect to pay 2x performance by default to get it.

but it makes it impossible to actually even express that operation in the Rust language

I'm already editing the RFC to require (rather than suggest) an attribute for this.


We could provide a separate set of types and allow extra accuracy in their
operations; however, this would create ABI differences between floating-point
functions, and the longer, less-well-known types seem unlikely to see
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, these wrappers could be repr(transparent).

Copy link
Member Author

@joshtriplett joshtriplett Apr 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean this in the sense that changing from one to the other would be an incompatible API change in a crate. I'll clarify that.

Copy link
Contributor

@gnzlbg gnzlbg Apr 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the algorithm does not care about contraction, it might also not care about NaNs, or associativity, or denormals, or ... so if it wants to accept a NonNaN<Associative<NoDenormals<fXY>>> type as well as the primitive f{32, 64} types, then it has to be generic, and if its generic, it would also accept a type wrapper lifting the assumption that contraction is not ok without breaking the API.

In other words, once one starts walking down the road of lifting assumptions about floating-point arithmetic, contraction is just one of the many many different assumptions that one might want to lift. Making it special does not solve the issue of these APIs having to be generic about these.

Copy link

@hanna-kruppe hanna-kruppe Apr 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we have anywhere near a smooth enough UX for working with wrappers around primitive arithmetic types for me to seriously consider them as a solution for licensing fast-math transformations. There's serious papercuts even when trying to generic over the existing primitive types (e.g., you can't use literals without wrapping them in ugly T::from calls), and we have even less machinery to address the mixing of different types that such wrappers would entail.

I also think it's quite questionable whether these should be properties of the type. It kind of fits "no infinities/nans/etc." but other things are fundamentally about particular operations and therefore may be OK in one code region but not in another code region even if the same data is being operated on.

We could provide a separate set of types and allow extra accuracy in their
operations; however, this would create ABI differences between floating-point
functions, and the longer, less-well-known types seem unlikely to see
widespread use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior art shows that people that need / want this are going to use them, e.g., "less-well-known_ flags like -ffast-math are in widespread use, even though they are not enabled by default. So it is unclear to me how much weight this argument should actually have.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate types are harder to drop into a code base than a compiler flag or attribute, though, because using the type in one place generally leads to type errors (and need for conversions to solve them) at the interface with other code.

We could do nothing, and require code to use `a.mul_add(b, c)` for
optimization; however, this would not allow for similar future optimizations,
and would not allow code to easily enable this optimization without substantial
code changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could provide a clippy lint that recognizes a * b + c (and many others), and tell people that if they don't care about precision, they can write a.mul_add(b, c) instead. We could have a group of clippy lints about these kind of things that people can enable in bulk.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this particular point a clippy lint is helpful but not necessarily enough. Once the optimizer chews through layers of code it can end up at an a * b + c expression without it being anything that is obvious to clippy.

@gnzlbg
Copy link
Contributor

gnzlbg commented Apr 18, 2019

Let's design a way to opt into and out of this behavior at crate/module/function first, and once that's done we can look at how to make more code use it automatically.

@rkruppe I would prefer even finer grained control than that, e.g., individual type wrappers that add a single assumption about floating-point math that the compiler is allowed to make and that can be combined, e.g.,

  • Trapless<T>: whether floating-point arithmetic can be assumed not to trap (e.g. on signaling NaNs)
  • Round{Nearest,0,+∞,-∞}<T> : whether the rounding mode can be assumed
  • Associative<T>: whether floating-point arithmetic can be assumed to be associative
  • Finite<T>: whether floating-point arithmetic can be assumed to produce numbers
  • Normal<T>: whether floating-point arithmetic can be assumed to produce normal numbers (as opposed to denormals/subnormals)
  • Contractable<T>: whether intermediate operations can be contracted using higher precision
  • ...

That way I can write a:

pub type Real = Trapless<Finite<Normal<Associative<Constractable<...<f32>...>>>>>>;

and use it throughout the parts of my code where its appropriate. When I need to interface with other crates (or they with me), I can still use f32/f64:

pub fn my_algo(x: f32) -> f32 {
    let r: Real = x.into()
    // ... do stuff with r ... 
    r.into()
}

Sure, some people might go overboard with these, and create complicated trait hierarchies, make all their code generic, etc. but one doesn't really need to do that (if somebody wants to provide a good library to abstract over all of this, similar to how num::Float works today, well they are free to do that, and those who find it useful will use it).

Global flags for turning these on/off require you to inspect the module/function/crate/ .cargo/config / ... to know what the rules for floating-point arithmetic are, and then use that knowledge to reason about your program, and the chances that some code that wasn't intended to play by those rules get those flags applied (e.g. because it was inlined, monomorphized, etc. on a module with those flags enabled), don't seem worth the risk (reading Fortran here gives me fond memories of writing implicit none at the top of every file).

The main argument of this RFC is that if we do something like this, then some code that expends 99% of its execution time doing a * b + c would be 2x slower. If that's the case, submitting a PR to change that code to a.fmul_add(b, c) is a no brainer (been there, done that: https://github.com/rust-lang-nursery/packed_simd/search?q=fma&type=Commits) - changing the behavior of all Rust code to fix such program feels overkill. If the issue is that code that could benefit from such a change is hard to fine, that's what clippy is for.

@scottmcm
Copy link
Member

@eaglgenes101
Copy link

eaglgenes101 commented Apr 18, 2019

In C, even if you make sure your compiler outputs code that uses IEEE 754 floats on all platforms, trying to get the same floating-point results across different platforms, build configurations, and times is an exercise in plugging up a bazillion abstraction leaks. That's par for the course for C. Not for Rust.

I am well aware that floating point is a mere approximation of the real numbers, and that you're suggesting transformations that would increase this accuracy. That said, I still disapprove of the proposed new defaults. I'd much rather not have the compiler try by default to second-guess me on what really should be a perfectly well-defined and predictable operation. I'd much rather the compiler, by default, choose some specific observable output behaviour, and stick to it, just like it normally does. I'll flick the floating point flags myself if I want to sacrifice determinism for a better approximation of what I've given up on since I was a clueless novice looking around for the reason why 0.1 + 0.2 === 0.3 evaluated to false. And I'm pretty sure I'd much rather performance-optimize another clueless programmer's slow floating point code than debug another clueless programmer's heisenbug-laden floating point code.

NaNs may also have unspecified bit patterns. However, IEEE 754 mandates behaviour for NaNs that make them opaque unless you specifically crack them open, and NaNs propagate through most floating-point operations, so if their payload can be disregarded, they are essentially fixed points of floating point operations. Small floating point evaluation differences tend to be magnified by systems with chaotic behaviour, which includes most nontrivial physical systems, and treating finite floats as opaque would completely defeat the purpose of doing the floating point computations in the first place.

@joshtriplett
Copy link
Member Author

By way of providing concrete examples that Rust already provides extra accuracy today on some platforms:

$ cat test.rs ; echo === ; rustc +nightly --target=i586-unknown-linux-gnu test.rs -o test32 && rustc +nightly test.rs -o test64 && ./test32 && ./test64
fn foo(num: f32) -> f32 {
    ((num + 0.1) / 1.5e38) * 1.5e38
}

fn main() {
    println!("error: {:.50}", foo(1.23456789) - 1.23456789 - 0.1);
}
===
error: 0.00000002235174179077148437500000000000000000000000
error: 0.00000014156103134155273437500000000000000000000000

i586-unknown-linux-gnu has more accuracy than x86_64-unknown-linux-gnu, because it does intermediate calculations with more precision. And changing that would substantially reduce performance.

@joshtriplett
Copy link
Member Author

joshtriplett commented Apr 18, 2019

@gnzlbg What code do you expect the compiler to generate when you use those generics? Because ultimately, if you want that, you're asking for pure software floating-point on many platforms.

@gnzlbg
Copy link
Contributor

gnzlbg commented Apr 18, 2019

@joshtriplett

What code do you expect the compiler to generate when you use arbitrary combinations of those types?

If you check the LangRef for the LLVM-IR of the floating point intrinsics, e.g., llvm.fmul for a * b, you see that since recently it looks like:

<result> = fmul [fast-math flags]* <ty> <op1>, <op2>   ; yields ty:result

where flags like nonnan, finite, etc. can be inserted in [fast-math flags].

So when one uses such a type, I expect that rustc will insert the fast-math flags for each operation as appropriate. That's more finer grained than just inserting them as function attributes for all functions in an LLVM module.

@joshtriplett
Copy link
Member Author

joshtriplett commented Apr 18, 2019

@gnzlbg What machine code do you expect to generate when every operation can potentially have different flags? How much performance do you consider reasonable to sacrifice to get the behavior you're proposing? What specific code do you want to write that depends on having such fine-grained type-level control of this behavior?

Not all abstract machines and specifications translate to reasonable machine code on concrete machines. If you want bit-for-bit identical results for floating point on different platforms and target feature flats and optimization levels, you're going to end up doing software floating point for many operations on many platforms, and I don't think that's going to meet people's expectations at all. If you can live with the current state that we've had for years, then this RFC is already consistent with that behavior.

I would like to request that discussion of adding much more fine-grained control of specific floating-point flags that weren't already raised in the RFC be part of some other RFC, rather than this one. I already have a mention of the idea of adding specific types, which covers the idea of (for instance) Contractable<T>. I don't think the full spectrum of flag-by-flag types mentioned in this comment is in scope for this RFC.

@joshtriplett
Copy link
Member Author

joshtriplett commented Apr 18, 2019

Expanding on my earlier comment, Rust also already allows floating-point accuracy to depend on optimization level, in addition to targets:

$ cat test.rs ; echo === ; rustc +nightly --target=i586-unknown-linux-gnu test.rs -o test32 && rustc +nightly --target=i586-unknown-linux-gnu -O test.rs -o test32-opt && rustc +nightly test.rs -o test64 && ./test32 && ./test32-opt && ./test64
fn foo(num: f32) -> f32 {
    ((num + 0.1) / 1.5e38) * 1.5e38
}

fn main() {
    let prog = std::env::args().next().unwrap();
    println!("{:12} error: {:.50}", prog, foo(1.23456789) - 1.23456789 - 0.1);
}
===
./test32     error: 0.00000002235174179077148437500000000000000000000000
./test32-opt error: 0.00000014156103134155273437500000000000000000000000
./test64     error: 0.00000014156103134155273437500000000000000000000000

So, in practice, Rust already has this behavior, and this RFC does not represent a breaking change.

(Worth noting that it's easy enough to reproduce this with f64 as well, just by changing the types and constants.)

@scottmcm
Copy link
Member

you're going to end up doing software floating point for many operations

For things like cos, yes, but not for ordinary addition.

From http://www.box2d.org/forum/viewtopic.php?f=3&t=1800#p16480:

I work at Gas Powered Games and i can tell you first hand that floating point math is deterministic. You just need the same instruction set and compiler and of course the user's processor adhears to the IEEE754 standard, which includes all of our PC and 360 customers. The engine that runs DemiGod, Supreme Commander 1 and 2 rely upon the IEEE754 standard. Not to mention probably all other RTS peer to peer games in the market. As soon as you have a peer to peer network game where each client broadcasts what command they are doing on what 'tick' number and rely on the client computer to figure out the simulation/physical details your going to rely on the determinism of the floating point processor.

So it's not trivial, but apparently it works across processor vendors and such.

…om other languages

"fast math" is widely perceived as an unsafe option to go faster by
sacrificing accuracy.
@Lokathor
Copy link
Contributor

Lokathor commented Apr 25, 2019

well, can perhaps we just be real about what we're telling the compiler to allow?

#![fp(allow_fma)]

Or are there things besides just allowing for FMA usage that we're talking about here? (EDIT: in this first wave of optimizations at least)

@fenrus75
Copy link

there most certainly are other things;
example would be a system where converting from f64 to f32 is expensive (rounding, range checks etc), and if a calculation is a mix of f32 and f64... this would allow the whole calculation to be done in f64 and the rounding down only at the final store to memory.

(f32->f64 tends to be cheap since it's mostly just padding 0 bits)

@programmerjake
Copy link
Member

From what I understand, LLVM (and probably Rust) by default assumes that traps don't occur and the rounding mode is set to round-to-nearest

@joshtriplett
Copy link
Member Author

joshtriplett commented Apr 25, 2019 via email

@joshtriplett
Copy link
Member Author

joshtriplett commented Apr 25, 2019 via email

@gThorondorsen
Copy link

On Wed, Apr 24, 2019 at 05:31:34PM -0700, Kornel wrote:
I don't mind that behavior. In fact, I'd like even more reckless-approx-math options. Is there a path to opting in to more fast fp math? Maybe #[extra_fp_precision(on)] could be #[fp(extra_precision)] and eventually become #[fp(extra_precision, associative, reciprocal_approx, no_signed_zero, no_traps)], etc.

I don't want to add those other flags in this RFC (I really want to avoid the implication of association with -ffast-math), but I have no objection to changing this to fp(extra_precision(off)) (or perhaps fp(no_extra_precision)), to allow for future fp(...) flags. That seems entirely reasonable.

In my opinion, this attribute should really have an option to disable all the optimisations that may change the exact results of the computations, present and future. So that people who care can write e.g. #[fp(strict)] and know their library will not break when new optimisations are introduced. And also allow fp(strict, contraction) or fp(strict, extra_precision) to enable optimisations selectively.

Also, I find fp to be too short and generic of a name. I would expect this attribute to also be able to control the rounding mode and trapping behaviour. It may or may not be a good idea to group all these features into a single attribute. I propose to use fp_optimize instead, and leave the other functionality to other names (probably fp-prefixed as well).

should explicitly discuss the issue of extra floating-point precision and how
to disable it. Furthermore, this change should not become part of a stable Rust
release until at least eight stable releases *after* it first becomes
implemented in the nightly compiler.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the point of this last sentence. And particularly, why is the reference point the first availability in nightly? I think it would be more useful to guarantee that the optimisations will not be enabled by default on stable until the opt-out has been available as a no-op for a few stable releases.

loss.) However, with some additional care, applications desiring cross-platform
identical results can potentially achieve that on multiple target platforms. In
particular, applications prioritizing identical, portable results across two or
more target platforms can disable extra floating-point precision entirely.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in a previous comment, mere reproducibility is not always the reason to disable this behaviour. Some algorithms can actually take advantage of the weird special properties of floating-point arithmetic. Such algorithms should remain implementable as Rust libraries, and those should not break just because someone decided they wanted their unrelated floating-point code to be as fast as possible.

@RalfJung
Copy link
Member

I have some concern with this approach, that I'd at least like to see listed in the "drawbacks".

  • It is kind of unclear how to make "you can use more precision" formal. After all there are observably just 64bits in an f64, so how can it store more precision than that? This comes up not just for a formalization but also when implementing this spec change in Miri. I do not see a reasonable way for Miri to actually give a higher-precision result for compound Rust expressions; after all these are still sequences of MIR instructions that are individually executed and that only have 64bit of state that can be carried from one step to the next.

    We could try to handle floating points similar to pointers, but that would mean that the compiler would have to be very careful about preserving that "provenance" on floating points.

  • And secondly, what this change does is add a whole lot of non-determinism. Basically, any floating-point operation can now non-deterministically be more precise. Given that unsafe code has to be safe under any legal execution of the program, this means that unsafe code working with floating points has to be extremely careful to be sure that it actually works with any allowed precision. Miri will not be able to help here as it seems unfeasible to actually try every permitted precision for every operation (even assuming we solved the problems raised in the first point). I don't know of any sanitizer or logic that is able to properly handle this kind of non-determinism, so I'd expect that for the foreseeable future, any tool we have to increase our confidence in unsafe code is just going to assume that "higher precision" is never used.

I understand the practical concerns leading here, but from a formal perspective and wanting to make the Rust spec precise, this RFC is a step backwards.

That is not at all a Rust-specific problem; -ffast-math has all the same issues. Some of my colleagues are working on a formal treatment of -ffast-math, but they are using basically symbolic semantics for floating-point expressions; it is entirely unclear how to combine this with things like mutable memory or how to build a problem logic for it or really how to do anything except for compiling it.^^ But at least -ffast-math is off-by-default...

back to a lower-precision format.

In general, providing more precision than required will only bring a
calculation closer to the mathematically precise answer, never further away.
Copy link
Member

@RalfJung RalfJung Oct 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like an extremely strong statement to me that needs further justification. I see no reason to assume such monotonicity here. Different rounding errors happening during a computation might as well just happen to cancel each other such that removing some errors actually increases the error of the final result.

Extreme example: 1 / 10 has a rounding error, but 1.0/10.0 - 1.0/10.0 actually gives the right result. Providing more precision only on one side of the subtraction increases the error of the entire calculation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nestordemeure
Copy link

I work on the measure of the numerical error introduced by floating-point arithmetic and I believe this RFC could be an all around improvement.

It is a speed improvement.

It is an accuracy improvement.
While there are operations that might become less accurate, the vast majority of computations will benefit (which is why I believe it should be the default).

It could be a determinism improvement.
As it has been said previously, things are already non-deterministic on some platform (one thing that has not been said is that you will get different result from debug to release if you introduce vectorization).
But adding a flag to enforce strict floating-point manipulation, to deactivate the proposed optimization, could also, in time, force all platform to conform to the norm if the flag is set (improving on the current situation). In short such a flag could remove a form of undefined behavior for user who care about binary reproducibility.

Finally, as others have suggested (and out of the scope of this RFC), I would love to have the ability to locally enforce strict floating-point manipulation. While I believe that binary reproducibility of floating-point result is often misguided, some operations do require absolute control from the user.

@RalfJung
Copy link
Member

It could be a determinism improvement.

That's a stretch. We could certainly provide binary guarantees for all platforms without introducing non-determinism for all platforms.

As it has been said previously, things are already non-deterministic on some platform (one thing that has not been said is that you will get different result from debug to release if you introduce vectorization).

Some platforms being ill-behaved does not seem like a good argument for introducing ill-behavedness on sane platforms.^^ (Some good arguments have been made in this thread, but this isn't one.)

In short such a flag could remove a form of undefined behavior for user who care about binary reproducibility.

There's no UB here, right? Just non-determinism.

@nestordemeure
Copy link

For me it is UB in the sense that the code's behavior is not specified and can, thus, vary on different platform in a way that is not predictable by the user.

My argument is not that the current situation is bad and thus it does not matter if we worsen it but that the current situation is unregulated and that this could bring in a flag to improve on the current situation when it matters (by specifying the expected behavior) and let things be when it doesn't (where I believe contraction is a better default).

Comment on lines +45 to +49
A note for users of other languages: this is *not* the equivalent of the "fast
math" option provided by some compilers. Unlike such options, this behavior
will never make any floating-point operation *less* accurate, but it can make
floating-point operations *more* accurate, making the result closer to the
mathematically exact answer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given #2686 (comment), I think this statement should be removed as it is incorrect -- or else there should be an argument for how we plan to guarantee that we never make things less accurate.

@RalfJung
Copy link
Member

RalfJung commented Oct 25, 2019

For me it is UB in the sense that the code's behavior is not specified and can, thus, vary on different platform in a way that is not predictable by the user.

UB is a technical term with a specific meaning, and this is not it. I get what you mean but please let's use terminology correctly, lest it become useless. :)

My argument is not that the current situation is bad and thus it does not matter if we worsen it but that the current situation is unregulated and that this could bring in a flag to improve on the current situation when it matters (by specifying the expected behavior) and let things be when it doesn't (where I believe contraction is a better default).

So I think the argument is that this reduces underspecification for platforms which currently do not faithfully implement IEEE semantics? I agree it does. It also makes those platforms not special exceptions any more. However, it does so by pulling all platforms (in their default config) down to the level of (what I consider to be) "misbehaving" platforms. The proposal is to use the lowest common denominator as the new default. (Please correct me if I misread.) Somehow I cannot see that as progress.

Ultimately this is a question of defaults: I would prefer the default to be IEEE with no exception, and then a way to opt-in to deviations from this strict baseline. These deviations would be in the style of -ffast-math: make stuff go faster, and maybe even become more precise, at the expense of predictability.

There seems to be a spectrum of "IEEE conformance", with full conformance on one end, "whatever C does per default" somewhere in the middle (where it will e.g. use x87 instructions), and full fast-math on the other end. If I read this proposal correctly, it proposes to make the Rust default the same as / close to the C default. But I do not see any good reason for picking this particular spot on the spectrum other than "C did it" (the claim that this never reduces accuracy has been refuted, from what I can tell). So if we ignore C, IMO the most obvious choices for the default are "fully conformant" or "fully fast-math", and the RFC does not do a good enough job arguing for why we should pick another default on some "random" spot in the middle.

@gnzlbg
Copy link
Contributor

gnzlbg commented Oct 25, 2019

It could be a determinism improvement.
As it has been said previously, things are already non-deterministic on some platform (one thing that has not been said is that you will get different result from debug to release if you introduce vectorization).

Right now, for a particular Rust toolchain and for many particular target platforms we are very close to having bit-per-bit deterministic results on a wide range of different hardware on that platform.

That is, if a user of your program hits a bug on some weird target on release mode, you can just pick the same toolchain and options, cross-compile to the target, and debug your program under QEMU and be able to reproduce, debug, and fix the issue.

While there are operations that might become less accurate, the vast majority of computations will benefit (which is why I believe it should be the default).

With this RFC, the results do not only depend on the optimization level, but also on the optimizations that actually get performed. Compiling in debug mode, changing the debug-info level, or debugging using print statements are all things that affect which optimizations get applied and end up altering the floating-point results.

The 32-bit x86 without SSE target is the only targets mentioned for which debugging is already hard due to these issues. The only debugging tool one ends up having is "Look at the assembly" and hope that you can figure the bug out from there. That's a bad user experience even for rustc maintainers.

Having reported bugs for these targets and having seen people invest a lot of time into figuring them out I don't see how making all targets equally hard to debug by default is a good value proposition. It'd be much simpler to instead use soft-floats on weird targets by default while adding an option that allows users to opt-in to the x87 FPU (with a big "warning" that documents known issues). I have yet to run into an actual user that wants to do high-performance work on a x86 32-bit CPU without SSE in 2019, but if those users end up appearing, we could always invest time and effort into improving that opt-in option when that happens. That sounds much better to me than lowering the debuggability of all other targets to "32-bit x86 without SSE" standards.

@programmerjake
Copy link
Member

I think it may be more useful to have IEEE 754 compliant (no FP traps, round-to-nearest-even, FP exception flags are ignored as an output -- basically what LLVM assumes by default on most platforms) be the default, and optimizations that change the results (such as fast-math and some forms of vectorization) be opt-in (at at least function, crate, and binary levels). This will improve reproducibility and debuggability such that results can be relied on cross-platform (excluding differences in NaN encodings) with a minor performance loss on unusual platforms (x86 without SSE). IEEE 754 compliance would not apply to SIMD types by default due to ARM (unfortunately) not supporting denormal numbers by default.

This is similar to how Rust has reproducible results for integer overflow/wrapping cross-platform even though C allows some forms of integer overflow to be undefined behavior.

@eaglgenes101
Copy link

For me it is UB in the sense that the code's behavior is not specified and can, thus, vary on different platform in a way that is not predictable by the user.

We call that unspecified behavior around here. Values which do not have a data dependency on the results of these computations are unaffected by the choice of semantics for floating point.

@Ixrec
Copy link
Contributor

Ixrec commented Oct 25, 2019

For completeness: there's an ongoing discussion over exactly what terminology we should use for this sort of thing in Rust (rust-lang/unsafe-code-guidelines#201), though it'll probably be something similar to "unspecified" or "implementation-defined".

Back on-topic: it seems clear that we should be looking into fine-grained opt-in mechanisms for fast-math-y things before we seriously consider any changes to the global default behavior. In particular, #2686 (comment) is exactly what I think we should do.

@hanna-kruppe
Copy link

@RalfJung and others who flirt with -ffast-math:

There seems to be a spectrum of "IEEE conformance", with full conformance on one end, "whatever C does per default" somewhere in the middle (where it will e.g. use x87 instructions), and full fast-math on the other end. If I read this proposal correctly, it proposes to make the Rust default the same as / close to the C default. But I do not see any good reason for picking this particular spot on the spectrum other than "C did it" (the claim that this never reduces accuracy has been refuted, from what I can tell). So if we ignore C, IMO the most obvious choices for the default are "fully conformant" or "fully fast-math", and the RFC does not do a good enough job arguing for why we should pick another default on some "random" spot in the middle.

Arguments for the default position on the spectrum are indeed needed, so let me try to supply some. I am still not in favor of this RFC, but I think it is much better than the equivalent of -ffast-math.

First off, -ffast-math allows the optimizer to assume certain values (NaNs, infinities, negative zeros, subnormals) can't exist even though they can actually occur at runtime. This is a (practically unavoidable) gateway to unsoundness, e.g. in LLVM an instruction with nnan flag that sees a NaN produces poison. So we certainly can't have that as our default behavior.

So if we exclude that, we still got the following on top of what's allows in the RFC (probably a non-exhaustive list, but should include everything clang -ffast-math does at least):

  1. rewriting ... / x to ... * (1 / x)
  2. approximating built-in functions like sin (less precision for higher performance)
  3. reassociation

IMO there is no strong reason to include or exclude (1) so whatever.

On the other hand, (2) is a very broad license to the compiler (there's no rules about how imprecise it can get) and one that is hard to make good use of in practice (because the compiler generally can't know what level of precision is acceptable for all of its users). Moreover, unless you're targeting a rather specialized chip that has hardware instructions for approximating transcendental functions, you can probably achieve the same effect by just using a different libm, which Rust does not yet support super well but could learn to do without touching the semantics of built-in types and operations.

As for (3), while any change to rounding can break the correctness of some numerical algorithms and snowball into an overall loss of accuracy, increasing precision of intermediate results is rather mild in this respect compared to freely performing reassociation, which can more easily and more drastically affect the results. It is also very important for enabling automatic vectorization of reductions, so it's still commonly enabled, but its benefits are much smaller for code that is not vectorizable.

For these reasons, I am quite sure something roughly like the RFC's position on the spectrum is a reasonable tradeoff between performance improvements and program reliability. Definitely not the only reasonable option, but clearly superior to full-on -ffast-math as default.

@RalfJung
Copy link
Member

RalfJung commented Nov 2, 2019

@rkruppe thanks for pointing out that full fast-path can cause UB; I agree that that is indeed a qualitative "step" somewhere on the line of floating point conformance.

@joshtriplett
Copy link
Member Author

I'd like to formally withdraw this RFC. I still think this is a good idea, and I think having this substantial optimization happen by default is important. But there are many concerns that need to be dealt with, and we'd likely need some better ways to opt out of or into this, at both a library-crate level and a project level. I don't have the bandwidth to do that design work at this time, so I'm going to close this.

If someone would be interested in working on the general issue of floating-point precision, FMA, and similar, I would be thrilled to serve as the liaison for it.

@jedbrown
Copy link

Is there any way at present to enable floating point contractions and/or associative math without dropping to intrinsics? Seeming inability to write things like a good dot product (e.g., https://godbolt.org/z/Y35sda) without intrinsics is a critical issue for adoption in numerical/scientific computing.

I think attributes of the #[fp(contract = "fast", associative = "on")] variety have the lowest cognitive load for people transitioning from C or Fortran. These can be opt-in at crate/module/function/block granularity. Encoding via types seems more intrusive to me -- by far the most common situation is that numerical libraries/apps want moderate permissiveness enabled everywhere except in some critical places. Note that icc enables -fp-model fast=1 by default, which is nearly analogous to gcc/clang -ffast-math.

@bend-n
Copy link

bend-n commented Nov 8, 2023

Is there any way at present to enable floating point contractions and/or associative math without dropping to intrinsics? Seeming inability to write things like a good dot product (e.g., godbolt.org/z/Y35sda) without intrinsics is a critical issue for adoption in numerical/scientific computing.

There isnt, but ive made a crate which allows you to use the faster floats without, yknow, great inconvenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-arithmetic Arithmetic related proposals & ideas A-attributes Proposals relating to attributes A-flags Proposals relating to rustc flags or flags for other tools. A-primitive Primitive types related proposals & ideas T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.