Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: ArrayBuffer Views #5

Merged
merged 6 commits into from
Jun 1, 2018
Merged

Conversation

dherman
Copy link
Contributor

@dherman dherman commented Nov 8, 2017

This RFC proposes modifying the JsArrayBuffer API to allow viewing the underlying buffer data with different binary formats, similar to how typed arrays work in JS. This gives safe Rust more expressive control to manipulate the data.

Rendered

@matklad
Copy link

matklad commented Nov 11, 2017

Is it safe to provide access to f64/f32? You can modify underling buffer via u64 such that it contains a signaling NaN bit pattern, whose safety properties are murky.

See how from_bits method deals the issue: https://doc.rust-lang.org/std/primitive.f64.html#method.from_bits (this workaround is not applicable here).

See rust-lang/rust#39271 (comment) for discussion.

I wonder how JavaScript itself deals with sNaNs here?

@dherman
Copy link
Contributor Author

dherman commented Nov 15, 2017

@matklad Right you are, and this is exactly the kind of worries I had about aliasing -- thank you so much for helping to zero in on the issues!

So one thing I think we could do here would be to create a custom Rust type that overloads Index and IndexMut so you can treat it like a slice, but with a well-specified and safe conversion semantics. The semantics of from_bits and to_bits seems underspecified, so maybe we couldn't even use that and would have to basically implement the serialization and deserialization ourselves?

@nixpulvis
Copy link

nixpulvis commented Nov 16, 2017

It's not immediately obvious to be why we can't do the same kind of trick Rust's from_bits does, and demote signaling NaNs to non-signaling NaNs. Adding a new trait for indexing an array with defined semantics seems like it'll need to do something very much like from_bits is doing.

Unless there's something about signaling NaNs we like and want to keep for some reason, though I've never dealt with that before.

@dherman
Copy link
Contributor Author

dherman commented Nov 17, 2017

why we can't do the same kind of trick Rust's from_bits does

Right, we can--that's what I'm suggesting about using a custom type that overloads indexing (instead of the native &mut [f64] type), so that we can define indexing to do NaN canonicalization with from_bits.

Unless there's something about signaling NaNs we like and want to keep for some reason, though I've never dealt with that before.

We definitely don't want them--signalling NaNs can raise a SIGFPE signal and abort the process.

@dherman
Copy link
Contributor Author

dherman commented Nov 17, 2017

Ha, and of course we already aren’t using native slice types, we’re using the CSlice and CMutSlice types! So I think what we can do is have them take an optional type parameter that defines read/write coercions, and have it default to a noop; then the exposed type for a float slice would be CMutSlice<f64, CanonicalizeNaNs>.

@dherman
Copy link
Contributor Author

dherman commented Nov 17, 2017

(I’ll revise the RFC with these ideas and see what people think.)

@matklad
Copy link

matklad commented Nov 17, 2017

We definitely don't want them--signalling NaNs can raise a SIGFPE signal and abort the process.

My understanding is that the problem with sNaN is a bit more nuanced. On x86/ARM in practice sNaNs don't cause SIGFPE (the relevant bit in the control register is off), so CPU is not the problem. But in LLVM sNaNs are not produced by any instruction except for transmute, so LLVM assumes that having an f64 value with sNaN bit pattern is UB. When LLVM does optimizations, it assumes that any code that observes sNaN is dead, etc, etc.

I'm suggesting about using a custom type that overloads indexing (instead of the native &mut [f64] type), so that we can define indexing to do NaN canonicalization with from_bits.

It would be great to see a prototype here: I am afraid that Rust Index/IndexMut are not flexible enough for this, because you have to return &'j T, you can't do MyRef<'j, T>.

@dherman
Copy link
Contributor Author

dherman commented Nov 17, 2017

CPU is not the problem. ... When LLVM does optimizations, it assumes that any code that observes sNaN is dead, etc, etc.

Ah, I see. Thank you for the explanation! So from_bits allows us to ensure we don't get an sNaN, which means we never need to write code to deal with one, which avoids LLVM doing unsafe optimizations based on UB assumptions. Did I understand that right?

I am afraid that Rust Index/IndexMut are not flexible enough for this

Yep, I just realized this. Argh! Back to brainstorming...

@matklad
Copy link

matklad commented Nov 17, 2017

So from_bits allows us to ensure we don't get an sNaN, which means we never need to write code to deal with one, which avoids LLVM doing unsafe optimizations based on UB assumptions. Did I understand that right?

Hm, I now think that I am confusing this with another IEEE754 issue: rust-lang/rust#10184. :)Better to read the thread on rust-lang/rust#39271 I guess :) The consensus seems that LLVM and not the CPU is the problem, and that we actually just don't know what LLVM guarantees are.

Yep, I just realized this. Argh! Back to brainstorming...

Just what JavaScript itself does with this problem? Is it possible to get an sNaN into JavaScripted variable because of this?

@dherman
Copy link
Contributor Author

dherman commented Nov 17, 2017

Just what JavaScript itself does with this problem? Is it possible to get an sNaN into JavaScripted variable because of this?

Since typed arrays give you the ability to write any bit patterns you like, you can generate the bit pattern of an sNaN in the buffer. But the only way for JavaScript to interpret those bits as a floating-point number is to use a Float32Array or Float64Array view over the data and read from that address in the buffer. And the JS semantics of using a floating-point view is that implementations canonicalize NaN values on both reading from and writing to a binary buffer.

But the spec says nothing specifically about sNaNs; it only says the implementation can choose any NaN bit pattern it wants. The spec says that a JS engine has to always produce the exact same NaN bit pattern whenever writing a NaN to a typed array. I'm not sure if JS engines actually obey that in the real world--there's been a lot of debate about what the spec should claim here, and the spec may not reflect reality (in which case the spec is probably wrong, not reality--JS engine vendors kind of get to call the shots on this issue). But if that rule is in fact legitimate, then it demonstrates that there's no such thing as an sNaN value in JS, since it basically means you can never observe differences between NaN values.

That's at the level of semantics, though. At the level of implementation, I don't know if any JS engines would allow an sNaN bit pattern to get passed around as a JS value. I would guess not, but I don't know for sure.

@dherman
Copy link
Contributor Author

dherman commented Nov 17, 2017

So maybe the best we can do here for floating point types is just to use an API that requires explicit methods for indexing into the slice, instead of being able to use [] syntax:

let f = slice.get(i);
slice.set(j, f);

@dherman
Copy link
Contributor Author

dherman commented Nov 17, 2017

I'm realizing there's a similar potential issue with integers, too, since it's at least theoretically possible that at some point we'll need to always canonicalize them to be little-endian, even on big-endian targets.

Years ago, I advocated for the spec mandating this but didn't succeed. Still, if the web actually comes to rely on little-endian, which I feel pretty confident it will, any big-endian architectures that ever come along and want to implement JavaScript engines are going to have to normalize integers in typed arrays to little-endian in order to be compatible with the ecosystem of JS content in the world.

It's also possible that big-endian is just dead and we don't have to worry about it.

The ergonomic impact of .get()/.set() is probably awful isn't it. :(

@matklad
Copy link

matklad commented Nov 17, 2017

The ergonomic impact of .get()/.set() is probably awful isn't it.

I think the biggest problem is not the ergonomics, but the fact that we must expose &[f64] representation for it to be useful, because pretty much any nifty super-fast crate would need a slice to work.

@BurntSushi, did you have a chance to dig into the question of LLVM handling of signaling NaNs? I think you were going to: rust-lang/rust#39271 (comment) :)

@dherman
Copy link
Contributor Author

dherman commented Nov 17, 2017

Good point. There’s no way to prevent a typed array from having sNaN bit patterns in it, so for this to have fully defined behavior either Rust has to be able to support sNaN or we can’t expose native slices. Definitely interested in the results of @BurntSushi’s research! :)

@matklad
Copy link

matklad commented Nov 17, 2017

either Rust has to be able to support sNaN or we can’t expose native slices.

We can, before giving access to [f64], change all sNaNs into qNaNs (sort of applying from_bits trick for the whole array), but it changes the cost from O(1) to O(N).

@matklad
Copy link

matklad commented Nov 17, 2017

Hm, this comment hints that maybe there are no problems after all here? servo/webrender#2027 (comment)

@dherman
Copy link
Contributor Author

dherman commented Nov 17, 2017

https://twitter.com/gankro/status/931535748628729856

@dherman
Copy link
Contributor Author

dherman commented Nov 17, 2017

change all sNaNs into qNaNs

Even if that weren’t expensive it wouldn’t be an option, unfortunately — typed arrays are used to hold heterogeneous information, so even if you’re looking at a section of it as float data other parts might be integers. So changing their bits would break programs.

But I think I’m starting to convince myself that we may be able to use actual slices for all types after all, with no canonicalization. I’m basing this on a few arguments:

  • Signalling NaN is safe so floats don’t need canonicalization to be safe in Rust.
  • There’s no reason Rust should have to be more portable than JS for endianness, and anyway in practice probably Neon will only be used on little-endian systems; anyone shipping JS typed array code on a big-endian system basically can never use code that relies on byte order because the ecosystem of JS code won’t be written correctly portably anyway. (For years I’ve asked around if people know of any modern JS engines with typed array support that ship on big-endian architectures and haven’t found any so far.)
  • The remaining piece is whether we can get away from the cslice library here, and I bet we can by passing the individual fields into C instead of the ABI-stable struct, but I need to study the code again. Alternatively maybe the ABI of Rust slices has stabilized since last I checked? If not maybe we can push for that to stabilize.

If we can get away with this, the benefit is getting to interop with all the Rust ecosystem that uses slices, as you say, and maybe also compiler optimizations around slices (I’m not sure).

@dherman
Copy link
Contributor Author

dherman commented Nov 19, 2017

OK, yeah, there's no reason why we needed cslice in the first place. We use its ABI in two (morally identical) functions, Neon_Buffer_Data and Neon_ArrayBuffer_Data:

https://github.com/neon-bindings/neon/blob/master/crates/neon-runtime/src/neon.cc#L202-L206

and instead of a single out-pointer for a pair, we can simply pass two out-pointers for the two fields, as stack-local variables:

https://github.com/neon-bindings/neon/blob/master/src/js/binary.rs#L42-L47

And then we can call std::slice::from_raw_parts to construct the Rust slice.

…iew" is the generic name used in the specs and docs for typed array types) in favor of `ArrayBufferData`, which is the name used in the actual JS spec
@dherman
Copy link
Contributor Author

dherman commented Dec 29, 2017

@matklad I've updated this RFC according to our discussion. I feel good about the state of it, so I'll probably tag it with "final comment period" soon.

@dherman dherman added the final comment period last opportunity to discuss the proposed conclusion label Mar 22, 2018

While it requires `unsafe` code, this design allows users to define their own `ViewTypes` for compound types such as tuples or structs.

## Convenience methods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a typical pattern in Rust? I haven't seen it before. What's the benefit over super fish, just simpler syntax for new users?

let a = but.as_u32_slice()[0];
let b = but.as_slice::<u32>()[1];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I figured turbofish is nice for people who are used to it, but in my experience it's been confusing to teach to new Rust programmers. So I figured the conveniences would be nice for tutorials, teaching material, slide decks, etc.

@dherman
Copy link
Contributor Author

dherman commented May 31, 2018

OK, one more question for anyone who may have an opinion: ArrayBufferData is not the best name for the data structure that represents the backing data, because we should use the same type for the backing data of both JsArrayBuffer and JsBuffer. Names I can think of:

  • BinaryData
  • BufferData
  • BackingData -- too generic
  • BackingBuffer -- confusing because we already have two types that "are buffers"
  • BinaryBuffer -- ditto
  • ByteBuffer -- ditto
  • ByteStore -- confusing with "app store"
  • BinaryStore -- ditto

I think the first two are my favorites. The latter is nice but it could confuse people into thinking it's only for JsBuffer. BinaryData has slightly less of a hint that it's related to buffers, but being the associated type for borrowing the contents of JsArrayBuffer and JsBuffer provides the association.

So I'm leaning towards BinaryData. Anyone want to agree or disagree?

…wType` so they make sense for both `JsArrayBuffer` and `JsBuffer`
@dherman
Copy link
Contributor Author

dherman commented May 31, 2018

Pushed the name change.

@dherman dherman changed the title WIP: RFC: ArrayBuffer Views RFC: ArrayBuffer Views Jun 1, 2018
@dherman dherman merged commit b2f6eda into neon-bindings:master Jun 1, 2018
@dherman dherman deleted the array-buffer-views branch June 1, 2018 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final comment period last opportunity to discuss the proposed conclusion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants