Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add f16 and f128 float types #3453

Merged
merged 6 commits into from
Oct 18, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions text/3453-f16-and-f128.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
- Feature Name: `f16_and_f128`
- Start Date: 2023-07-02
- RFC PR: [rust-lang/rfcs#3453](https://github.com/rust-lang/rfcs/pull/3453)
- Rust Issue: [rust-lang/rfcs#2629](https://github.com/rust-lang/rfcs/issues/2629)

# Summary
[summary]: #summary

This RFC proposes adding new IEEE-compliant floating point types `f16` and `f128` into the core language and standard library. We will provide a soft float implementation for all targets, and use hardware support where possible.

# Motivation
[motivation]: #motivation

The IEEE 754 standard defines many binary floating point formats. The most common of these types are the binary32 and binary64 formats, available in Rust as `f32` and `f64`. However, other formats are useful in various uncommon scenarios. The binary16 format is useful for situations where storage compactness is important and low precision is acceptable, such as HDR images, mesh quantization, and AI neural networks.[^1] The binary128 format is useful for situations where high precision is needed, such as scientific computing contexts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The IEEE 754 standard defines many binary floating point formats. The most common of these types are the binary32 and binary64 formats, available in Rust as `f32` and `f64`. However, other formats are useful in various uncommon scenarios. The binary16 format is useful for situations where storage compactness is important and low precision is acceptable, such as HDR images, mesh quantization, and AI neural networks.[^1] The binary128 format is useful for situations where high precision is needed, such as scientific computing contexts.
The IEEE 754 standard defines many binary floating point formats. The most common of these types are the binary32 and binary64 formats, available in Rust as `f32` and `f64`. However, other formats are useful in various scenarios. The binary16 format is useful for situations where storage compactness is important and low precision is acceptable, such as HDR images, mesh quantization, and AI neural networks.[^1] The binary128 format is useful for situations where high precision is needed, such as scientific computing contexts.

I wouldn't call these scenarios uncommon since we are advocating for supporting them 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be dishonest to not say they are uncommon. 99% of use cases don't need 16-bit or 128-bit floats (thus, the reason most languages have been fine without them so far). But also, most use cases don't need 32-bit floats either (thus, the reason many languages like JavaScript and Python have their only float type as 64-bit). Our argument advocating for supporting these formats is that the 1% of use cases greatly benefit from language support and that this deserves to be a core feature in the Rust language, just like 32-bit floats.

Copy link
Contributor

@tgross35 tgross35 Oct 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "less common" then? Doesn't really matter since it's passed RFC, uncommon just sounds a bit more like we maybe shouldn't be supporting it


This RFC proposes adding `f16` and `f128` primitive types in Rust to represent IEEE 754 binary16 and binary128, respectively. Having `f16` and `f128` types in the Rust language would allow Rust to better support the above mentioned use cases, allowing for optimizations and native support that may not be possible in a third party crate. Additionally, providing a single canonical data type for these floating point representations will make it easier to exchange data between libraries.

This RFC does not have the goal of covering the entire IEEE 754 standard, since it does not include `f256` and the decimal-float types. This RFC also does not have the goal of adding existing platform-specific float types such as x86's 80-bit double-extended-precision. This RFC does not make a judgement of whether those types should be added in the future, such discussion can be left to a future RFC, but it is not the goal of this RFC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the biggest things called out in the meeting was that motivation could use some enhancement. Some assorted things that you may want to add:

  • Somehow say that the downsides to adding these are small (since we are letting LLVM do the work), but that this can be extremely important for some niches
  • Point out what exactly can and can't be done via external crates. E.g. can provide an implementation, difficult to manage because it needs to rely on handwritten assembly and linking external libraries
  • Maybe enhance the "scientific computing contexts" statement with more specific examples. You could probably mention how Julia and NumPy support f128 operations
  • That there are numerous use cases for something more precise than f64 - in C this is typically long double, Rust doesn't have anything. So we both don't have a way to do this more precise math, nor do we have a way to interop with C
  • long double in C is already _Float128 on many targets, so a lot of C libraries are using f128. Not having something is blocking us from some use cases, e.g. cc long double targeting wasm32-wasi links incorrectly rust#74393
  • @pvdrz has mentioned making that easier for bindgen a few times, bindgen's job getting easier is probably worth mentioning specifically
  • Something from the C++ motivation might work for f16

Some of this is or could be in rationale and alternatives too, that section usually expands upon motivation


# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

`f16` and `f128` are primitive floating types, they can be used like `f32` or `f64`. They always conform to binary16 and binary128 formats defined in the IEEE 754 standard, which means the size of `f16` is always 16-bit, the size of `f128` is always 128-bit, the amount of exponent and mantissa bits follows the standard, and all operations are IEEE 754-compliant. Float literals of these sizes have `f16` and `f128` suffixes respectively.

```rust
let val1 = 1.0; // Default type is still f64
let val2: f128 = 1.0; // Explicit f128 type
let val3: f16 = 1.0; // Explicit f16 type
let val4 = 1.0f128; // Suffix of f128 literal
let val5 = 1.0f16; // Suffix of f16 literal
aaronfranke marked this conversation as resolved.
Show resolved Hide resolved

println!("Size of f128 in bytes: {}", std::mem::size_of_val(&val2)); // 16
println!("Size of f16 in bytes: {}", std::mem::size_of_val(&val3)); // 2
```

Every target should support `f16` and `f128`, either in hardware or software. Most platforms do not have hardware support and therefore will need to use a software implementation.

All [operators](https://doc.rust-lang.org/stable/std/primitive.f64.html#trait-implementations), [constants](https://doc.rust-lang.org/stable/std/f64/consts/), and [math functions](https://doc.rust-lang.org/stable/std/primitive.f64.html#implementations) defined for `f32` and `f64` in `core`, must also be defined for `f16` and `f128` in `core`. Similarly, all functionality defined for `f32` and `f64` in `std` must also be defined for `f16` and `f128`.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

## `f16` type

`f16` consists of 1 bit of sign, 5 bits of exponent, 10 bits of mantissa. It is exactly equivalent to the 16-bit IEEE 754 binary16 [half-precision floating-point format](https://en.wikipedia.org/wiki/Half-precision_floating-point_format).

The following traits will be implemented for conversion between `f16` and other types:

```rust
impl From<f16> for f32 { /* ... */ }
impl From<f16> for f64 { /* ... */ }
impl From<bool> for f16 { /* ... */ }
impl From<u8> for f16 { /* ... */ }
impl From<i8> for f16 { /* ... */ }
```

Conversions to `f16` will also be available with `as` casts, which allow for truncated conversions.

`f16` will generate the `half` type in LLVM IR. It is also equivalent to C++ `std::float16_t`, C `_Float16`, and GCC `__fp16`. `f16` is ABI-compatible with all of these. `f16` values must be aligned in memory on a multiple of 16 bits, or 2 bytes.

On the hardware level, `f16` can be accelerated on RISC-V via [the Zfh or Zfhmin extensions](https://five-embeddev.com/riscv-isa-manual/latest/zfh.html), on x86 with AVX-512 via [the FP16 instruction set](https://en.wikipedia.org/wiki/AVX-512#FP16), on [some Arm platforms](https://developer.arm.com/documentation/100067/0607/Other-Compiler-specific-Features/Half-precision-floating-point-number-format), and on PowerISA via [VSX on PowerISA v3.1B and later](https://files.openpower.foundation/s/dAYSdGzTfW4j2r2). Most platforms do not have hardware support and therefore will need to use a software implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
On the hardware level, `f16` can be accelerated on RISC-V via [the Zfh or Zfhmin extensions](https://five-embeddev.com/riscv-isa-manual/latest/zfh.html), on x86 with AVX-512 via [the FP16 instruction set](https://en.wikipedia.org/wiki/AVX-512#FP16), on [some Arm platforms](https://developer.arm.com/documentation/100067/0607/Other-Compiler-specific-Features/Half-precision-floating-point-number-format), and on PowerISA via [VSX on PowerISA v3.1B and later](https://files.openpower.foundation/s/dAYSdGzTfW4j2r2). Most platforms do not have hardware support and therefore will need to use a software implementation.
On the hardware level, `f16` can be accelerated on RISC-V via [the Zfh or Zfhmin extensions](https://five-embeddev.com/riscv-isa-manual/latest/zfh.html), on x86 with AVX-512 via [the FP16 instruction set](https://en.wikipedia.org/wiki/AVX-512#FP16), on [some ARM platforms](https://developer.arm.com/documentation/100067/0607/Other-Compiler-specific-Features/Half-precision-floating-point-number-format), and on PowerISA via [VSX on PowerISA v3.1B and later](https://files.openpower.foundation/s/dAYSdGzTfW4j2r2). Most platforms do not have hardware support and therefore will need to use a software implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion is incorrect, Arm is the official capitalization as of a few years ago. https://www.arm.com/architecture


## `f128` type

`f128` consists of 1 bit of sign, 15 bits of exponent, 112 bits of mantissa. It is exactly equivalent to the 128-bit IEEE 754 binary128 [quadruple-precision floating-point format](https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format).

The following traits will be implemented for conversion between `f128` and other types:

```rust
impl From<f16> for f128 { /* ... */ }
impl From<f32> for f128 { /* ... */ }
impl From<f64> for f128 { /* ... */ }
impl From<bool> for f128 { /* ... */ }
impl From<u8> for f128 { /* ... */ }
impl From<i8> for f128 { /* ... */ }
impl From<u16> for f128 { /* ... */ }
impl From<i16> for f128 { /* ... */ }
impl From<u32> for f128 { /* ... */ }
impl From<i32> for f128 { /* ... */ }
impl From<u64> for f128 { /* ... */ }
impl From<i64> for f128 { /* ... */ }
```

Conversions from `i128`/`u128` to `f128` will also be available with `as` casts, which allow for truncated conversions.

`f128` will generate the `fp128` type in LLVM IR. It is also equivalent to C++ `std::float128_t`, C `_Float128`, and GCC `__float128`. `f128` is ABI-compatible with all of these. `f128` values must be aligned in memory on a multiple of 128 bits, or 16 bytes. LLVM provides support for 128-bit float math operations.

On the hardware level, `f128` can be accelerated on RISC-V via [the Q extension](https://five-embeddev.com/riscv-isa-manual/latest/q.html), on IBM [S/390x G5 and later](https://doi.org/10.1147%2Frd.435.0707), and on PowerISA via [BFP128, an optional part of PowerISA v3.0C and later](https://files.openpower.foundation/s/XXFoRATEzSFtdG8). Most platforms do not have hardware support and therefore will need to use a software implementation.

aaronfranke marked this conversation as resolved.
Show resolved Hide resolved
# Drawbacks
[drawbacks]: #drawbacks

While `f32` and `f64` have very broad support in most hardware, hardware support for `f16` and `f128` is more niche. On most systems software emulation will be required. Therefore, the main drawback is implementation difficulty.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

There are some crates aiming for similar functionality:

- [f128](https://github.com/jkarns275/f128) provides binding to the `__float128` type in GCC.
- [half](https://crates.io/crates/half) provides an implementation of binary16 and bfloat16 types.

However, besides the disadvantage of usage inconsistency between primitive types and types from a crate, there are still issues around those bindings.

The ability to accelerate additional float types heavily depends on CPU/OS/ABI/features of different targets heavily. Evolution of LLVM may unlock possibilities of accelerating the types on new targets. Implementing them in the compiler allows the compiler to perform optimizations for hardware with native support for these types.

Crates may define their type on top of a C binding, but extended float type definition in C is complex and confusing. The meaning of C types may vary by target and/or compiler options. Implementing `f16` and `f128` in the Rust compiler helps to maintain a stable codegen interface and ensures that all users have one single canonical definition of 16-bit and 128-bit float types, making it easier to exchange data between crates and languages.

# Prior art
[prior-art]: #prior-art
aaronfranke marked this conversation as resolved.
Show resolved Hide resolved

As noted above, there are crates that provide these types, one for `f16` and one for `f128`. Another prior art to reference is [RFC 1504 for int128](https://rust-lang.github.io/rfcs/1504-int128.html).

Many other languages and compilers have support for these proposed float types. As mentioned above, C has `_Float16` and `_Float128` ([IEC 60559 WG 14 N2601](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2601.pdf)), and C++ has `std::float16_t` and `std::float128_t` ([P1467R9](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1467r9.html)). Glibc supports 128-bit floats in software on [many architectures](https://sourceware.org/git/?p=glibc.git;a=blob;f=NEWS;hb=81325b12b14c44887f1633a2c180a413afc2b504#l143). GCC also provides the `libquadmath` library for 128-bit float math operations.

aaronfranke marked this conversation as resolved.
Show resolved Hide resolved
This RFC was split from [RFC 3451], which proposed adding a variety of float types beyond what is in this RFC including interoperability types like `c_longdouble`. The remaining portions [RFC 3451] has since developed into [RFC 3456].

Both this RFC and RFC 3451 are built upon the discussion in [issue 2629](https://github.com/rust-lang/rfcs/issues/2629).

The main consensus of the discussion thus far is that more float types would be useful, especially the IEEE 754 types proposed in this RFC as `f16` and `f128`. Other types can be discussed in a future RFC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to "rationale and alternatives", say a bit more about why f80/doubledouble were rejected as part of this RFC (because we want to support the most standard type)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's worth restating the details of float types not a part of this RFC. It should be sufficient to say in this section that there are other types, and list a few in "Future possibilities".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a one sentence statement at the end of future possibilities "These types were not included as part of this RFC since their implementation is more open to interpretation, this RFC focuses on well-defined types". Just as a hint to anyone reading this in a few years why we didn't include those types (in the case that the other RFC doesn't go forward).

But, I'm indifferent


# Unresolved questions
[unresolved-questions]: #unresolved-questions

The main unresolved parts of this RFC are the implementation details in the context of the Rust compiler and standard library. The behavior of `f16` and `f128` is well-defined by the IEEE 754 standard, and is not up for debate. Whether these types should be included in the language is the main question of this RFC, which will be resolved when this RFC is accepted.

Several future questions are intentionally left unresolved, and should be handled by another RFC. This RFC does not have the goal of covering the entire IEEE 754 standard, since it does not include `f256` and the decimal-float types. This RFC also does not have the goal of adding existing platform-specific float types such as x86's 80-bit double-extended-precision.
Comment on lines +125 to +127
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have come to consensus that this RFC will only support f16 and f128, so all these unresolved questions can be removed.

Unresolved question to add: based on @scottmcm's comment #3453 (comment), I don't know if we will be able to support f128 parsing at first. It requires changes within our float parsing library that we wouldn't want to do if there is any notable impact, so just note this as to be determined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's resolved in that it's not the goal of this RFC to add those features, but it's still unresolved in terms of what will be done in the future.

As for f128 parsing, that seems like an implementation detail that needs to be figured out by the implementers to ensure we support f128 parsing. I don't think it's meaningful for this RFC to say "we don't need f128 parsing", if it's missing for now that's fine, but long-term I don't see a reason to not have it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that since they other float types are covered in Future Possibilities they don't need to be here. Usually Unresolved Questions are the things that need to be figured out between the RFC being accepted and the feature being stabilized, Future Possibilities are out of scope things that could go in other RFCs. (I know the template has out of scope things in both, but I don't see any accepted RFCs that actually put them under unresolved questions)


# Future possibilities
[future-possibilities]: #future-possibilities

See [RFC 3456] for discussion about adding more float types including `f80`, `bf16`, and `c_longdouble`, which is an extension of the discussion in [RFC 3451].

[^1]: Existing AI neural networks often use the [16-bit brain float format](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) instead of 16-bit half precision, which is a truncated version of 32-bit single precision. This is done to allow performing operations with 32-bit floats and quickly convert to 16-bit for storage.
aaronfranke marked this conversation as resolved.
Show resolved Hide resolved

[RFC 3451]: https://github.com/rust-lang/rfcs/pull/3451
[RFC 3456]: https://github.com/rust-lang/rfcs/pull/3456