Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IEEE 754 compliant fmt/parse of -0, infinity, NaN #78618

Merged
merged 5 commits into from
Mar 27, 2021

Conversation

workingjubilee
Copy link
Member

This pull request improves the Rust float formatting/parsing libraries to comply with IEEE 754's formatting expectations around certain special values, namely signed zero, the infinities, and NaN. It also adds IEEE 754 compliance tests that, while less stringent in certain places than many of the existing flt2dec/dec2flt capability tests, are intended to serve as the beginning of a roadmap to future compliance with the standard. Some relevant documentation is also adjusted with clarifying remarks.

This PR follows from discussion in rust-lang/rfcs#1074, and closes #24623.

The most controversial change here is likely to be that -0 is now printed as -0. Allow me to explain: While there appears to be community support for an opt-in toggle of printing floats as if they exist in the naively expected domain of numbers, i.e. not the extended reals (where floats live), IEEE 754-2019 is clear that a float converted to a string should be capable of being transformed into the original floating point bit-pattern when it satisfies certain conditions (namely, when it is an actual numeric value i.e. not a NaN and the original and destination float width are the same). -0 is given special attention here as a value that should have its sign preserved. In addition, the vast majority of other programming languages not only output -0 but output -0.0 here.

While IEEE 754 offers a broad leeway in how to handle producing what it calls a "decimal character sequence", it is clear that the operations a language provides should be capable of round tripping, and it is confusing to advertise the f32 and f64 types as binary32 and binary64 yet have the most basic way of producing a string and then reading it back into a floating point number be non-conformant with the standard. Further, existing documentation suggested that e.g. -0 would be printed with -0 regardless of the presence of the + fmt character, but it prints "+0" instead if given such (which was what led to the opening of #24623).

There are other parsing and formatting issues for floating point numbers which prevent Rust from complying with the standard, as well as other well-documented challenges on the arithmetic level, but I hope that this can be the beginning of motion towards solving those challenges.

@rust-highfive
Copy link
Collaborator

r? @joshtriplett

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 1, 2020
@jyn514 jyn514 added A-floating-point Area: Floating point numbers and arithmetic T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Nov 1, 2020
@workingjubilee workingjubilee force-pushed the ieee754-fmt branch 3 times, most recently from c12c4b6 to d229327 Compare November 1, 2020 02:08
@KodrAus
Copy link
Contributor

KodrAus commented Nov 11, 2020

We briefly discussed this in the recent Libs meeting and felt that format!("{:+}", -0) producing +0 was definitely a bug that we should fix.

It turns out the question of always printing -0 as -0 instead of 0 in the Display impl is indeed controversial! The fundamental question is whether we should aim for strict IEEE 754 compliance for the default Display implementation on floats. There were some concerns that we're somewhat locked in to the current behavior. Debug should absolutely be compliant though.

Since this turned out to be a bit of a bigger topic than we really had synchronous time for, I think we should try revisit it in more detail with a picture of what a roadmap towards compliance would look like for float Display and how that changes assumptions made by consumers of it.

@crlf0710
Copy link
Member

Triage: So i created #79490 to tracking the controversal part, and i'll mark this as blocked on the decision of that issue.

@workingjubilee I think you can go ahead and split out PRs for the non-controversal part of this (Debug, fixing {:+}, etc).

@crlf0710 crlf0710 added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 28, 2020
@workingjubilee
Copy link
Member Author

To reply to the stability concerns:

  1. This is being proposed as fix in the standard library's implementation of the f32 and f64 types, so it is not automatically subject to Rust's stability guarantees as I understand them per RFC #1122.
  2. Further, Display impls do not have a consensus on their stability per Decide on stability of Display output for libstd/libcore/etc. types #72676.
  3. And floats already have other extant difficulties with round-tripping well, per Float parsing can fail on valid float literals #31407, which means that anyone relying on this behavior is likely already broken.
  4. Further, as Rust defaults to an IEEE754 noncompliant approach here, assuming that this is not already broken but in a silent manner for users is largely incorrect.

My apologies, I know this PR has been open for a while without much remark, but that is because I am not interested in pursuing this line of inquiry further if this is in fact a reason to turn this PR down. Accordingly, I will not be splitting the changeset. From my perspective, either Rust has IEEE754 floats which behave in the expected way, in accord with the standard, or it does not. If it does not, then what the standard says about parsing NaN and such is equally moot.

@programmerjake
Copy link
Member

programmerjake commented Dec 9, 2020

  1. And floats already have other extant difficulties with round-tripping well, per Float parsing can fail on valid float literals #31407, which means that anyone relying on this behavior is likely already broken.

Well, #31407 (comment) seems to imply that non-NaN floats are intended to round-trip on using Display and it is likely not broken for non-zero finite numbers.

@bors
Copy link
Contributor

bors commented Jan 14, 2021

☔ The latest upstream changes (presumably #78259) made this pull request unmergeable. Please resolve the merge conflicts.

@joshtriplett
Copy link
Member

We discussed this in the @rust-lang/libs meeting this week. We had consensus that changing floating-point printing to print the negative sign on -0 seems reasonable. We don't expect that to cause a problem in practice.

I'm going to go ahead and start a libs FCP. Please go ahead and update the PR to apply without conflicts.

@rfcbot merge

@rfcbot
Copy link

rfcbot commented Mar 13, 2021

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Mar 13, 2021
@dtolnay
Copy link
Member

dtolnay commented Mar 13, 2021

One thing I don't see called out in the discussion above or any of the linked issues is the case lenient parsing in this PR, e.g. "inF".parse::<f64>() -- before: Err(ParseFloatError { kind: Invalid }), after: Ok(inf). Is there a reference that justifies that behavior change?

@rfcbot concern case insensitivity

@joshtriplett
Copy link
Member

@dtolnay From IEEE 754-2008 (I don't have 2019 available), "5.12.1 External character sequences representing zeros, infinities, and NaNs":

Conversion of external character sequences “inf” and “infinity” (regardless of case) with an optional preceding sign, to a supported floating-point format shall produce an infinity (with the same sign as the input).

Conversion of external character sequences “nan” (regardless of case) with an optional preceding sign, to a supported floating-point format shall produce a quiet NaN.

@dtolnay
Copy link
Member

dtolnay commented Mar 14, 2021

Thanks, that seems quite clear.
@rfcbot resolve case insensitivity
@rfcbot reviewed

@rfcbot rfcbot added the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Mar 14, 2021
@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-blocked Status: Blocked on something else such as an RFC or other implementation work. labels Mar 27, 2021
@m-ou-se m-ou-se added the relnotes Marks issues that should be documented in the release notes of the next release. label Mar 27, 2021
@bors
Copy link
Contributor

bors commented Mar 27, 2021

⌛ Testing commit e8dfbac with merge aef1140...

@bors
Copy link
Contributor

bors commented Mar 27, 2021

☀️ Test successful - checks-actions
Approved by: m-ou-se
Pushing aef1140 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 27, 2021
@bors bors merged commit aef1140 into rust-lang:master Mar 27, 2021
@rustbot rustbot added this to the 1.53.0 milestone Mar 27, 2021
tspiteri added a commit to tspiteri/num-traits that referenced this pull request Apr 15, 2021
This copies the standard library behavior fix of
rust-lang/rust#78618
tspiteri added a commit to tspiteri/num-traits that referenced this pull request Apr 15, 2021
This copies the standard library behavior fix of
<rust-lang/rust#78618>.
tspiteri added a commit to tspiteri/num-traits that referenced this pull request Apr 15, 2021
This copies some of the standard library behavior fix of
<rust-lang/rust#78618>.
tspiteri added a commit to tspiteri/num-traits that referenced this pull request Apr 15, 2021
This copies some of the standard library fixes of
<rust-lang/rust#78618>.
bors bot added a commit to rust-num/num-traits that referenced this pull request Jun 16, 2021
214: Ignore case for float parsing of text special values r=cuviper a=tspiteri

This copies some of the standard library fixes of <rust-lang/rust#78618>.

Co-authored-by: Trevor Spiteri <[email protected]>
benesch added a commit to benesch/materialize that referenced this pull request Jun 18, 2021
In SQLite's SLT, floats are printed as integers by casting the float to
an integer then printing the integer. Our SLT runner was printing floats
rounding the float then printing the float with zero digits after the
decimal point. This is almost the same thing, except that "0" is printed
as "-0" since rust-lang/rust#78618. Future proof the code (and make it
work the same way in the nightly coverage build) by following the SQLite
approach.

Supersedes MaterializeInc#7096.
benesch added a commit to benesch/materialize that referenced this pull request Jun 18, 2021
In SQLite's SLT, floats are printed as integers by casting the float to
an integer then printing the integer. Our SLT runner was printing floats
rounding the float then printing the float with zero digits after the
decimal point. This is almost the same thing, except that "0" is printed
as "-0" since rust-lang/rust#78618. Future proof the code (and make it
work the same way in the nightly coverage build) by following the SQLite
approach.

Supersedes MaterializeInc#7096.
wip-sync pushed a commit to NetBSD/pkgsrc-wip that referenced this pull request Jun 20, 2021
Pkgsrc changes:
 * Bump bootstrap requirements to 1.52.1
 * Adjust patches, adapt to upstream changes, adjust cargo checksums
 * If using an external llvm, require >= 10.0

Upsteream changes:

Version 1.53.0 (2021-06-17)
============================

Language
-----------------------
- [You can now use unicode for identifiers.][83799] This allows
  multilingual identifiers but still doesn't allow glyphs that are
  not considered characters such as `#` (diamond) or `<U+1F980>`
  (crab). More specifically you can now use any identifier that
  matches the UAX #31 "Unicode Identifier and Pattern Syntax" standard. This
  is the same standard as languages like Python, however Rust uses NFC
  normalization which may be different from other languages.
- [You can now specify "or patterns" inside pattern matches.][79278]
  Previously you could only use `|` (OR) on complete patterns. E.g.
  ```rust
  let x = Some(2u8);
  // Before
  matches!(x, Some(1) | Some(2));
  // Now
  matches!(x, Some(1 | 2));
  ```
- [Added the `:pat_param` `macro_rules!` matcher.][83386] This matcher
  has the same semantics as the `:pat` matcher. This is to allow `:pat`
  to change semantics to being a pattern fragment in a future edition.

Compiler
-----------------------
- [Updated the minimum external LLVM version to LLVM 10.][83387]
- [Added Tier 3\* support for the `wasm64-unknown-unknown` target.][80525]
- [Improved debuginfo for closures and async functions on Windows MSVC.][83941]

\* Refer to Rust's [platform support page][platform-support-doc] for more
information on Rust's tiered platform support.

Libraries
-----------------------
- [Abort messages will now forward to `android_set_abort_message` on
  Android platforms when available.][81469]
- [`slice::IterMut<'_, T>` now implements `AsRef<[T]>`][82771]
- [Arrays of any length now implement `IntoIterator`.][84147]
  Currently calling `.into_iter()` as a method on an array will
  return `impl Iterator<Item=&T>`, but this may change in a
  future edition to change `Item` to `T`. Calling `IntoIterator::into_iter`
  directly on arrays will provide `impl Iterator<Item=T>` as expected.
- [`leading_zeros`, and `trailing_zeros` are now available on all
  `NonZero` integer types.][84082]
- [`{f32, f64}::from_str` now parse and print special values
  (`NaN`, `-0`) according to IEEE RFC 754.][78618]
- [You can now index into slices using `(Bound<usize>, Bound<usize>)`.][77704]
- [Add the `BITS` associated constant to all numeric types.][82565]

Stabilised APIs
---------------
- [`AtomicBool::fetch_update`]
- [`AtomicPtr::fetch_update`]
- [`BTreeMap::retain`]
- [`BTreeSet::retain`]
- [`BufReader::seek_relative`]
- [`DebugStruct::non_exhaustive`]
- [`Duration::MAX`]
- [`Duration::ZERO`]
- [`Duration::is_zero`]
- [`Duration::saturating_add`]
- [`Duration::saturating_mul`]
- [`Duration::saturating_sub`]
- [`ErrorKind::Unsupported`]
- [`Option::insert`]
- [`Ordering::is_eq`]
- [`Ordering::is_ge`]
- [`Ordering::is_gt`]
- [`Ordering::is_le`]
- [`Ordering::is_lt`]
- [`Ordering::is_ne`]
- [`OsStr::is_ascii`]
- [`OsStr::make_ascii_lowercase`]
- [`OsStr::make_ascii_uppercase`]
- [`OsStr::to_ascii_lowercase`]
- [`OsStr::to_ascii_uppercase`]
- [`Peekable::peek_mut`]
- [`Rc::decrement_strong_count`]
- [`Rc::increment_strong_count`]
- [`Vec::extend_from_within`]
- [`array::from_mut`]
- [`array::from_ref`]
- [`char::MAX`]
- [`char::REPLACEMENT_CHARACTER`]
- [`char::UNICODE_VERSION`]
- [`char::decode_utf16`]
- [`char::from_digit`]
- [`char::from_u32_unchecked`]
- [`char::from_u32`]
- [`cmp::max_by_key`]
- [`cmp::max_by`]
- [`cmp::min_by_key`]
- [`cmp::min_by`]
- [`f32::is_subnormal`]
- [`f64::is_subnormal`]

Cargo
-----------------------
- [Cargo now supports git repositories where the default `HEAD` branch is not
  "master".][cargo/9392] This also includes a switch to the version
  3 `Cargo.lock` format which can handle default branches correctly.
- [macOS targets now default to `unpacked` split-debuginfo.][cargo/9298]
- [The `authors` field is no longer included in `Cargo.toml` for new
  projects.][cargo/9282]

Rustdoc
-----------------------
- [Added the `rustdoc::bare_urls` lint that warns when you have URLs
  without hyperlinks.][81764]

Compatibility Notes
-------------------
- [Implement token-based handling of attributes during expansion][82608]
- [`Ipv4::from_str` will now reject octal format IP addresses in addition
  to rejecting hexadecimal IP addresses.][83652] The octal format can lead
  to confusion and potential security vulnerabilities and [is no
  longer recommended][ietf6943].

Internal Only
-------------
These changes provide no direct user facing benefits, but represent significant
improvements to the internals and overall performance of rustc and
related tools.

- [Rework the `std::sys::windows::alloc` implementation.][83065]
- [rustdoc: Don't enter an infer_ctxt in get_blanket_impls for
  impls that aren't blanket impls.][82864]
- [rustdoc: Only look at blanket impls in `get_blanket_impls`][83681]
- [Rework rustdoc const type][82873]

[83386]: rust-lang/rust#83386
[82771]: rust-lang/rust#82771
[84147]: rust-lang/rust#84147
[84082]: rust-lang/rust#84082
[83799]: rust-lang/rust#83799
[83681]: rust-lang/rust#83681
[83652]: rust-lang/rust#83652
[83387]: rust-lang/rust#83387
[82873]: rust-lang/rust#82873
[82864]: rust-lang/rust#82864
[82608]: rust-lang/rust#82608
[82565]: rust-lang/rust#82565
[80525]: rust-lang/rust#80525
[79278]: rust-lang/rust#79278
[78618]: rust-lang/rust#78618
[77704]: rust-lang/rust#77704
[83941]: rust-lang/rust#83941
[83065]: rust-lang/rust#83065
[81764]: rust-lang/rust#81764
[81469]: rust-lang/rust#81469
[cargo/9298]: rust-lang/cargo#9298
[cargo/9282]: rust-lang/cargo#9282
[cargo/9392]: rust-lang/cargo#9392
[`char::MAX`]: https://doc.rust-lang.org/std/primitive.char.html#associatedconstant.MAX
[`char::REPLACEMENT_CHARACTER`]: https://doc.rust-lang.org/std/primitive.char.html#associatedconstant.REPLACEMENT_CHARACTER
[`char::UNICODE_VERSION`]: https://doc.rust-lang.org/std/primitive.char.html#associatedconstant.UNICODE_VERSION
[`char::decode_utf16`]: https://doc.rust-lang.org/std/primitive.char.html#method.decode_utf16
[`char::from_u32`]: https://doc.rust-lang.org/std/primitive.char.html#method.from_u32
[`char::from_u32_unchecked`]: https://doc.rust-lang.org/std/primitive.char.html#method.from_u32_unchecked
[`char::from_digit`]: https://doc.rust-lang.org/std/primitive.char.html#method.from_digit
[`AtomicBool::fetch_update`]: https://doc.rust-lang.org/std/sync/atomic/struct.AtomicBool.html#method.fetch_update
[`AtomicPtr::fetch_update`]: https://doc.rust-lang.org/std/sync/atomic/struct.AtomicPtr.html#method.fetch_update
[`BTreeMap::retain`]: https://doc.rust-lang.org/std/collections/struct.BTreeMap.html#method.retain
[`BTreeSet::retain`]: https://doc.rust-lang.org/std/collections/struct.BTreeSet.html#method.retain
[`BufReader::seek_relative`]: https://doc.rust-lang.org/std/io/struct.BufReader.html#method.seek_relative
[`DebugStruct::non_exhaustive`]: https://doc.rust-lang.org/std/fmt/struct.DebugStruct.html#method.finish_non_exhaustive
[`Duration::MAX`]: https://doc.rust-lang.org/std/time/struct.Duration.html#associatedconstant.MAX
[`Duration::ZERO`]: https://doc.rust-lang.org/std/time/struct.Duration.html#associatedconstant.ZERO
[`Duration::is_zero`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.is_zero
[`Duration::saturating_add`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.saturating_add
[`Duration::saturating_mul`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.saturating_mul
[`Duration::saturating_sub`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.saturating_sub
[`ErrorKind::Unsupported`]: https://doc.rust-lang.org/std/io/enum.ErrorKind.html#variant.Unsupported
[`Option::insert`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.insert
[`Ordering::is_eq`]: https://doc.rust-lang.org/std/cmp/enum.Ordering.html#method.is_eq
[`Ordering::is_ge`]: https://doc.rust-lang.org/std/cmp/enum.Ordering.html#method.is_ge
[`Ordering::is_gt`]: https://doc.rust-lang.org/std/cmp/enum.Ordering.html#method.is_gt
[`Ordering::is_le`]: https://doc.rust-lang.org/std/cmp/enum.Ordering.html#method.is_le
[`Ordering::is_lt`]: https://doc.rust-lang.org/std/cmp/enum.Ordering.html#method.is_lt
[`Ordering::is_ne`]: https://doc.rust-lang.org/std/cmp/enum.Ordering.html#method.is_ne
[`OsStr::eq_ignore_ascii_case`]: https://doc.rust-lang.org/std/ffi/struct.OsStr.html#method.eq_ignore_ascii_case
[`OsStr::is_ascii`]: https://doc.rust-lang.org/std/ffi/struct.OsStr.html#method.is_ascii
[`OsStr::make_ascii_lowercase`]: https://doc.rust-lang.org/std/ffi/struct.OsStr.html#method.make_ascii_lowercase
[`OsStr::make_ascii_uppercase`]: https://doc.rust-lang.org/std/ffi/struct.OsStr.html#method.make_ascii_uppercase
[`OsStr::to_ascii_lowercase`]: https://doc.rust-lang.org/std/ffi/struct.OsStr.html#method.to_ascii_lowercase
[`OsStr::to_ascii_uppercase`]: https://doc.rust-lang.org/std/ffi/struct.OsStr.html#method.to_ascii_uppercase
[`Peekable::peek_mut`]: https://doc.rust-lang.org/std/iter/struct.Peekable.html#method.peek_mut
[`Rc::decrement_strong_count`]: https://doc.rust-lang.org/std/rc/struct.Rc.html#method.increment_strong_count
[`Rc::increment_strong_count`]: https://doc.rust-lang.org/std/rc/struct.Rc.html#method.increment_strong_count
[`Vec::extend_from_within`]: https://doc.rust-lang.org/beta/std/vec/struct.Vec.html#method.extend_from_within
[`array::from_mut`]: https://doc.rust-lang.org/beta/std/array/fn.from_mut.html
[`array::from_ref`]: https://doc.rust-lang.org/beta/std/array/fn.from_ref.html
[`cmp::max_by_key`]: https://doc.rust-lang.org/beta/std/cmp/fn.max_by_key.html
[`cmp::max_by`]: https://doc.rust-lang.org/beta/std/cmp/fn.max_by.html
[`cmp::min_by_key`]: https://doc.rust-lang.org/beta/std/cmp/fn.min_by_key.html
[`cmp::min_by`]: https://doc.rust-lang.org/beta/std/cmp/fn.min_by.html
[`f32::is_subnormal`]: https://doc.rust-lang.org/std/primitive.f64.html#method.is_subnormal
[`f64::is_subnormal`]: https://doc.rust-lang.org/std/primitive.f64.html#method.is_subnormal
[ietf6943]: https://datatracker.ietf.org/doc/html/rfc6943#section-3.1.1
ruchirK pushed a commit to ruchirK/materialize that referenced this pull request Jun 21, 2021
In SQLite's SLT, floats are printed as integers by casting the float to
an integer then printing the integer. Our SLT runner was printing floats
rounding the float then printing the float with zero digits after the
decimal point. This is almost the same thing, except that "0" is printed
as "-0" since rust-lang/rust#78618. Future proof the code (and make it
work the same way in the nightly coverage build) by following the SQLite
approach.

Supersedes MaterializeInc#7096.
aljoscha pushed a commit to aljoscha/materialize that referenced this pull request Jul 21, 2021
In SQLite's SLT, floats are printed as integers by casting the float to
an integer then printing the integer. Our SLT runner was printing floats
rounding the float then printing the float with zero digits after the
decimal point. This is almost the same thing, except that "0" is printed
as "-0" since rust-lang/rust#78618. Future proof the code (and make it
work the same way in the nightly coverage build) by following the SQLite
approach.

Supersedes MaterializeInc#7096.
@workingjubilee workingjubilee deleted the ieee754-fmt branch October 4, 2021 20:45
@namse
Copy link

namse commented Jul 15, 2022

Is there any fmt option to remove - sign for -0? I couldn't find it from (docs)[https://doc.rust-lang.org/std/fmt/index.html#sign0].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-floating-point Area: Floating point numbers and arithmetic disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. merged-by-bors This PR was explicitly merged by bors. relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong formatting of negative zero when sign is requested