-
Notifications
You must be signed in to change notification settings - Fork 692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document why NaN bits are not fully deterministic #619
Comments
Is there any way to make them impossible to observe or at least the sign bit? |
IMO the NaN nondeterminisms should be left alone, given that it is so rare for software to care about them. What's the case for sacrificing performance to make them deterministic? |
Here's a list of the ways one can observe NaN bits:
VMs could insert canonicalizing code before each of these; that's the "canonicalize at escape points" idea discussed above. |
@qwertie Benefits might include slightly greater portability (there does exist code in the world that unwisely uses the IEEE 754 totalOrder function, for example), slightly greater reproducibility, and stronger invariants when doing symmetric computations across multiple nodes. |
For this case could we just always specify the sign value? so like just make it 1 if it gets placed in mem? |
@wanderer That's essentially what canonicalization entails: check to see if the value is a NaN, and if so, apply some correction. It's typically an extra compare and branch on the hot path. I should also add, I haven't benchmarked any of the options mentioned in this issue; any benchmarking that anyone can add here would be welcome. |
I would strongly lean towards the current level of non-determinism, which favors performance. If the wasm is being generated from a high level language, that translator could provide an option to control the level of FP semantics, similar to -ffast-math for example. The translator could then insert any extra wasm fixup needed to coerce nans to a desired format. Towards that end we could provide isNan or normalizeNan operators, though I'm not advocating for this now. In short, "fixing" this is best left to a higher level tool, IMO. |
+1 @mbodart. Edit: oh look there's actually a +1 thingie now :). Btw, I personally think Wasm = world domination and that therefore future processors won't antagonize it. |
I'd like to quantify the performance effects of this before making a decision. I think our toolchains are still too immature to make good performance measurements at the moment (there are longer poles in the way of this one). Put another way: I want to avoid "death by a thousand cuts". |
I agree with @jfbastien. Performance effects need to be quantified first before tinkering with the semantics. |
To be clear, I'm not currently advocating for a change here; I'm collecting rationale material. NaN bit nondeterminism sticks out, and wants explanation. And as far as I'm aware, no one has quantified the performance effects of making these bits nondeterministic either. |
ARMv8 does not always propagate the first operand when both operands are NaN. The rule is:
As I am reading the spec, this applies to both the Aarch32 and Aarch64 modes of ARMv8. This behavior is different from SSE which propagates the first operand in both cases. Both architectures will convert an sNaN to a qNaN by setting the quiet bit before propagating it. |
@stoklund Good spot! I missed that ARM picking the first NaN doesn't happen in the case where the second NaN is signaling. That would complicate the strategy I layed out for the multiple NaN case above, so we can mention that in the rationale. |
NaN bits are a surprising thing to see in Nondeterminism.md, given how deterministic WebAssembly is otherwise. This PR adds some explanation to Rationale.md. This is meant to finish #619.
I've now created #973 to propose specific text summarizing the above. |
* Explain why NaN bitpatterns are nondeterministic. NaN bits are a surprising thing to see in Nondeterminism.md, given how deterministic WebAssembly is otherwise. This PR adds some explanation to Rationale.md. This is meant to finish #619. * Clarify "should" by saying "(as opposed to shall)". * Mention globals and linear memories as ways that NaN bitpatterns can be observed. And clarify the wording in a few other places.
(I'm not currently advocating; this is so we make an informed decision and collect material for a rationale.)
Looking at Nondeterminism.md, the NaN bits parts stand out. Of course threads can race, resources can be exhausted, and features can be added; those are consequences of the overall design. But NaN bits? Could VMs be a little more clever and take care of this? There are two issues:
When an operation gets more than one NaN operand, which does it propagate?
IEEE 754 leaves this unspecified, but at least x86, ARM, and Power all pick the "first" operand, and it's a reasonable choice.
However, fixing the choice at the wasm level would mean that wasm implementations can't commute floating-point adds and multiplies, which is sometimes a useful optimization on pre-VEX x86 where the instructions clobber one of their inputs. Also, it would require anyone using an LLVM-based VM to teach it that add and multiply are not commutative on wasm.
One could reasonably argue these aren't show-stoppers, so this issue is theoretically fixable if there's a strong desire.
When an operation produces a NaN and has no NaN operands, what is the sign bit?
x86 uses 1, ARM uses 0.
The simplest way to fix this would be to canonicalize after every floating-point operation. This is doable, though a trickiness is that both 0 and 1 are possible valid values when there is a NaN operand to propagate, so it wouldn't be enough to just check for a NaN result and canonicalize; one would have to check for a NaN result and for a lack of NaN operands, and only then canonicalize.
Canonicalizing after every operation would be very expensive; another option is to just canonicalize at "escape" points of computations (the canonicalizing path would then have to essentially replay a whole stream of computation in order to determine the correct NaN output). This is an improvement, but likely still adds a significant amount of overhead.
Another possible implementation would be to unmask the invalid exception, take a trap whenever a NaN is generated, and then perform the canonicalization and return. Downsides include using a non-default CPU mode, and being very slow if there are a lot of invalids being generated.
Unfortunately, these approaches carry significant downsides. Unless other ideas surface, or there is a very strong desire, this seems difficult to fix.
One other thing to note is that NaN bits are difficult to accidentally observe, so this isn't a major portability concern in general.
Does anyone else have any thoughts to add?
The text was updated successfully, but these errors were encountered: