Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document why NaN bits are not fully deterministic #619

Closed
sunfishcode opened this issue Mar 22, 2016 · 15 comments
Closed

Document why NaN bits are not fully deterministic #619

sunfishcode opened this issue Mar 22, 2016 · 15 comments

Comments

@sunfishcode
Copy link
Member

(I'm not currently advocating; this is so we make an informed decision and collect material for a rationale.)

Looking at Nondeterminism.md, the NaN bits parts stand out. Of course threads can race, resources can be exhausted, and features can be added; those are consequences of the overall design. But NaN bits? Could VMs be a little more clever and take care of this? There are two issues:

When an operation gets more than one NaN operand, which does it propagate?

IEEE 754 leaves this unspecified, but at least x86, ARM, and Power all pick the "first" operand, and it's a reasonable choice.

However, fixing the choice at the wasm level would mean that wasm implementations can't commute floating-point adds and multiplies, which is sometimes a useful optimization on pre-VEX x86 where the instructions clobber one of their inputs. Also, it would require anyone using an LLVM-based VM to teach it that add and multiply are not commutative on wasm.

One could reasonably argue these aren't show-stoppers, so this issue is theoretically fixable if there's a strong desire.

When an operation produces a NaN and has no NaN operands, what is the sign bit?

x86 uses 1, ARM uses 0.

The simplest way to fix this would be to canonicalize after every floating-point operation. This is doable, though a trickiness is that both 0 and 1 are possible valid values when there is a NaN operand to propagate, so it wouldn't be enough to just check for a NaN result and canonicalize; one would have to check for a NaN result and for a lack of NaN operands, and only then canonicalize.

Canonicalizing after every operation would be very expensive; another option is to just canonicalize at "escape" points of computations (the canonicalizing path would then have to essentially replay a whole stream of computation in order to determine the correct NaN output). This is an improvement, but likely still adds a significant amount of overhead.

Another possible implementation would be to unmask the invalid exception, take a trap whenever a NaN is generated, and then perform the canonicalization and return. Downsides include using a non-default CPU mode, and being very slow if there are a lot of invalids being generated.

Unfortunately, these approaches carry significant downsides. Unless other ideas surface, or there is a very strong desire, this seems difficult to fix.


One other thing to note is that NaN bits are difficult to accidentally observe, so this isn't a major portability concern in general.

Does anyone else have any thoughts to add?

@sunfishcode sunfishcode added this to the MVP milestone Mar 22, 2016
@wanderer
Copy link
Contributor

One other thing to note is that NaN bits are difficult to accidentally observe

Is there any way to make them impossible to observe or at least the sign bit?

@qwertie
Copy link

qwertie commented Mar 22, 2016

IMO the NaN nondeterminisms should be left alone, given that it is so rare for software to care about them. What's the case for sacrificing performance to make them deterministic?

@sunfishcode
Copy link
Member Author

Here's a list of the ways one can observe NaN bits:

  • reinterpret conversion
  • store to linear memory and load the bits with a different interpretation (or let the bits be observed externally)
  • pass an argument to a call of an imported function, or return a value from an exported function
  • copysign the sign bit onto a non-NaN

VMs could insert canonicalizing code before each of these; that's the "canonicalize at escape points" idea discussed above. store is very common on hot paths, so this would likely still be fairly expensive.

@wanderer
Copy link
Contributor

@qwertie I very much do care about them and I have a usecase that rests on them being deterministic. But as disccussed here there are ways to make it fully determisitic even if it doesn't get in the spec.

@sunfishcode
Copy link
Member Author

@qwertie Benefits might include slightly greater portability (there does exist code in the world that unwisely uses the IEEE 754 totalOrder function, for example), slightly greater reproducibility, and stronger invariants when doing symmetric computations across multiple nodes.

@wanderer
Copy link
Contributor

store is very common on hot paths, so this would likely still be fairly expensive.

For this case could we just always specify the sign value? so like just make it 1 if it gets placed in mem?

@sunfishcode
Copy link
Member Author

@wanderer That's essentially what canonicalization entails: check to see if the value is a NaN, and if so, apply some correction. It's typically an extra compare and branch on the hot path.

I should also add, I haven't benchmarked any of the options mentioned in this issue; any benchmarking that anyone can add here would be welcome.

@mbodart
Copy link

mbodart commented Mar 22, 2016

I would strongly lean towards the current level of non-determinism, which favors performance.
I was actually somewhat surprised that wasm rigorously defines the NaN's mantissa bits.
While this may suffice for today's processors, who knows what future processors may come along.

If the wasm is being generated from a high level language, that translator could provide an option to control the level of FP semantics, similar to -ffast-math for example. The translator could then insert any extra wasm fixup needed to coerce nans to a desired format. Towards that end we could provide isNan or normalizeNan operators, though I'm not advocating for this now.

In short, "fixing" this is best left to a higher level tool, IMO.

@qwertie
Copy link

qwertie commented Mar 22, 2016

+1 @mbodart. Edit: oh look there's actually a +1 thingie now :). Btw, I personally think Wasm = world domination and that therefore future processors won't antagonize it.

@jfbastien
Copy link
Member

I'd like to quantify the performance effects of this before making a decision. I think our toolchains are still too immature to make good performance measurements at the moment (there are longer poles in the way of this one). Put another way: I want to avoid "death by a thousand cuts".

@titzer
Copy link

titzer commented Mar 24, 2016

I agree with @jfbastien. Performance effects need to be quantified first before tinkering with the semantics.

@sunfishcode
Copy link
Member Author

To be clear, I'm not currently advocating for a change here; I'm collecting rationale material. NaN bit nondeterminism sticks out, and wants explanation. And as far as I'm aware, no one has quantified the performance effects of making these bits nondeterministic either.

@stoklund
Copy link

ARMv8 does not always propagate the first operand when both operands are NaN. The rule is:

  • If one operand is a quiet NaN and the other is a signaling NaN, propagate the signaling NaN.
  • Otherwise, propagate the first operand.

As I am reading the spec, this applies to both the Aarch32 and Aarch64 modes of ARMv8.

This behavior is different from SSE which propagates the first operand in both cases.

Both architectures will convert an sNaN to a qNaN by setting the quiet bit before propagating it.

@sunfishcode
Copy link
Member Author

@stoklund Good spot! I missed that ARM picking the first NaN doesn't happen in the case where the second NaN is signaling. That would complicate the strategy I layed out for the multiple NaN case above, so we can mention that in the rationale.

@sunfishcode sunfishcode changed the title Should NaN bits be fully deterministic? Document why NaN bits are not fully deterministic Jul 11, 2016
@sunfishcode sunfishcode self-assigned this Jul 11, 2016
sunfishcode added a commit that referenced this issue Jan 31, 2017
NaN bits are a surprising thing to see in Nondeterminism.md, given how
deterministic WebAssembly is otherwise. This PR adds some explanation to
Rationale.md.

This is meant to finish #619.
@sunfishcode
Copy link
Member Author

I've now created #973 to propose specific text summarizing the above.

sunfishcode added a commit that referenced this issue Feb 3, 2017
* Explain why NaN bitpatterns are nondeterministic.

NaN bits are a surprising thing to see in Nondeterminism.md, given how
deterministic WebAssembly is otherwise. This PR adds some explanation to
Rationale.md.

This is meant to finish #619.

* Clarify "should" by saying "(as opposed to shall)".

* Mention globals and linear memories as ways that NaN bitpatterns can be observed.

And clarify the wording in a few other places.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants