RFC: ArrayBuffer Views #5

dherman · 2017-11-08T18:45:42Z

This RFC proposes modifying the JsArrayBuffer API to allow viewing the underlying buffer data with different binary formats, similar to how typed arrays work in JS. This gives safe Rust more expressive control to manipulate the data.

Rendered

… drawbacks and alternatives sections

matklad · 2017-11-11T19:19:52Z

Is it safe to provide access to f64/f32? You can modify underling buffer via u64 such that it contains a signaling NaN bit pattern, whose safety properties are murky.

See how from_bits method deals the issue: https://doc.rust-lang.org/std/primitive.f64.html#method.from_bits (this workaround is not applicable here).

See rust-lang/rust#39271 (comment) for discussion.

I wonder how JavaScript itself deals with sNaNs here?

dherman · 2017-11-15T18:39:03Z

@matklad Right you are, and this is exactly the kind of worries I had about aliasing -- thank you so much for helping to zero in on the issues!

So one thing I think we could do here would be to create a custom Rust type that overloads Index and IndexMut so you can treat it like a slice, but with a well-specified and safe conversion semantics. The semantics of from_bits and to_bits seems underspecified, so maybe we couldn't even use that and would have to basically implement the serialization and deserialization ourselves?

nixpulvis · 2017-11-16T21:30:32Z

It's not immediately obvious to be why we can't do the same kind of trick Rust's from_bits does, and demote signaling NaNs to non-signaling NaNs. Adding a new trait for indexing an array with defined semantics seems like it'll need to do something very much like from_bits is doing.

Unless there's something about signaling NaNs we like and want to keep for some reason, though I've never dealt with that before.

dherman · 2017-11-17T03:32:45Z

why we can't do the same kind of trick Rust's from_bits does

Right, we can--that's what I'm suggesting about using a custom type that overloads indexing (instead of the native &mut [f64] type), so that we can define indexing to do NaN canonicalization with from_bits.

Unless there's something about signaling NaNs we like and want to keep for some reason, though I've never dealt with that before.

We definitely don't want them--signalling NaNs can raise a SIGFPE signal and abort the process.

dherman · 2017-11-17T03:36:24Z

Ha, and of course we already aren’t using native slice types, we’re using the CSlice and CMutSlice types! So I think what we can do is have them take an optional type parameter that defines read/write coercions, and have it default to a noop; then the exposed type for a float slice would be CMutSlice<f64, CanonicalizeNaNs>.

dherman · 2017-11-17T03:37:03Z

(I’ll revise the RFC with these ideas and see what people think.)

matklad · 2017-11-17T07:13:14Z

We definitely don't want them--signalling NaNs can raise a SIGFPE signal and abort the process.

My understanding is that the problem with sNaN is a bit more nuanced. On x86/ARM in practice sNaNs don't cause SIGFPE (the relevant bit in the control register is off), so CPU is not the problem. But in LLVM sNaNs are not produced by any instruction except for transmute, so LLVM assumes that having an f64 value with sNaN bit pattern is UB. When LLVM does optimizations, it assumes that any code that observes sNaN is dead, etc, etc.

I'm suggesting about using a custom type that overloads indexing (instead of the native &mut [f64] type), so that we can define indexing to do NaN canonicalization with from_bits.

It would be great to see a prototype here: I am afraid that Rust Index/IndexMut are not flexible enough for this, because you have to return &'j T, you can't do MyRef<'j, T>.

dherman · 2017-11-17T07:54:14Z

CPU is not the problem. ... When LLVM does optimizations, it assumes that any code that observes sNaN is dead, etc, etc.

Ah, I see. Thank you for the explanation! So from_bits allows us to ensure we don't get an sNaN, which means we never need to write code to deal with one, which avoids LLVM doing unsafe optimizations based on UB assumptions. Did I understand that right?

I am afraid that Rust Index/IndexMut are not flexible enough for this

Yep, I just realized this. Argh! Back to brainstorming...

matklad · 2017-11-17T08:08:05Z

So from_bits allows us to ensure we don't get an sNaN, which means we never need to write code to deal with one, which avoids LLVM doing unsafe optimizations based on UB assumptions. Did I understand that right?

Hm, I now think that I am confusing this with another IEEE754 issue: rust-lang/rust#10184. :)Better to read the thread on rust-lang/rust#39271 I guess :) The consensus seems that LLVM and not the CPU is the problem, and that we actually just don't know what LLVM guarantees are.

Yep, I just realized this. Argh! Back to brainstorming...

Just what JavaScript itself does with this problem? Is it possible to get an sNaN into JavaScripted variable because of this?

dherman · 2017-11-17T08:25:31Z

Just what JavaScript itself does with this problem? Is it possible to get an sNaN into JavaScripted variable because of this?

Since typed arrays give you the ability to write any bit patterns you like, you can generate the bit pattern of an sNaN in the buffer. But the only way for JavaScript to interpret those bits as a floating-point number is to use a Float32Array or Float64Array view over the data and read from that address in the buffer. And the JS semantics of using a floating-point view is that implementations canonicalize NaN values on both reading from and writing to a binary buffer.

But the spec says nothing specifically about sNaNs; it only says the implementation can choose any NaN bit pattern it wants. The spec says that a JS engine has to always produce the exact same NaN bit pattern whenever writing a NaN to a typed array. I'm not sure if JS engines actually obey that in the real world--there's been a lot of debate about what the spec should claim here, and the spec may not reflect reality (in which case the spec is probably wrong, not reality--JS engine vendors kind of get to call the shots on this issue). But if that rule is in fact legitimate, then it demonstrates that there's no such thing as an sNaN value in JS, since it basically means you can never observe differences between NaN values.

That's at the level of semantics, though. At the level of implementation, I don't know if any JS engines would allow an sNaN bit pattern to get passed around as a JS value. I would guess not, but I don't know for sure.

dherman · 2017-11-17T08:34:37Z

So maybe the best we can do here for floating point types is just to use an API that requires explicit methods for indexing into the slice, instead of being able to use [] syntax:

let f = slice.get(i);
slice.set(j, f);

dherman · 2017-11-17T09:13:46Z

I'm realizing there's a similar potential issue with integers, too, since it's at least theoretically possible that at some point we'll need to always canonicalize them to be little-endian, even on big-endian targets.

Years ago, I advocated for the spec mandating this but didn't succeed. Still, if the web actually comes to rely on little-endian, which I feel pretty confident it will, any big-endian architectures that ever come along and want to implement JavaScript engines are going to have to normalize integers in typed arrays to little-endian in order to be compatible with the ecosystem of JS content in the world.

It's also possible that big-endian is just dead and we don't have to worry about it.

The ergonomic impact of .get()/.set() is probably awful isn't it. :(

matklad · 2017-11-17T09:20:25Z

The ergonomic impact of .get()/.set() is probably awful isn't it.

I think the biggest problem is not the ergonomics, but the fact that we must expose &[f64] representation for it to be useful, because pretty much any nifty super-fast crate would need a slice to work.

@BurntSushi, did you have a chance to dig into the question of LLVM handling of signaling NaNs? I think you were going to: rust-lang/rust#39271 (comment) :)

dherman · 2017-11-17T09:27:35Z

Good point. There’s no way to prevent a typed array from having sNaN bit patterns in it, so for this to have fully defined behavior either Rust has to be able to support sNaN or we can’t expose native slices. Definitely interested in the results of @BurntSushi’s research! :)

matklad · 2017-11-17T09:30:40Z

either Rust has to be able to support sNaN or we can’t expose native slices.

We can, before giving access to [f64], change all sNaNs into qNaNs (sort of applying from_bits trick for the whole array), but it changes the cost from O(1) to O(N).

matklad · 2017-11-17T09:49:29Z

Hm, this comment hints that maybe there are no problems after all here? servo/webrender#2027 (comment)

dherman · 2017-11-17T15:18:50Z

https://twitter.com/gankro/status/931535748628729856

dherman · 2017-11-17T16:01:08Z

change all sNaNs into qNaNs

Even if that weren’t expensive it wouldn’t be an option, unfortunately — typed arrays are used to hold heterogeneous information, so even if you’re looking at a section of it as float data other parts might be integers. So changing their bits would break programs.

But I think I’m starting to convince myself that we may be able to use actual slices for all types after all, with no canonicalization. I’m basing this on a few arguments:

Signalling NaN is safe so floats don’t need canonicalization to be safe in Rust.
There’s no reason Rust should have to be more portable than JS for endianness, and anyway in practice probably Neon will only be used on little-endian systems; anyone shipping JS typed array code on a big-endian system basically can never use code that relies on byte order because the ecosystem of JS code won’t be written correctly portably anyway. (For years I’ve asked around if people know of any modern JS engines with typed array support that ship on big-endian architectures and haven’t found any so far.)
The remaining piece is whether we can get away from the cslice library here, and I bet we can by passing the individual fields into C instead of the ABI-stable struct, but I need to study the code again. Alternatively maybe the ABI of Rust slices has stabilized since last I checked? If not maybe we can push for that to stabilize.

If we can get away with this, the benefit is getting to interop with all the Rust ecosystem that uses slices, as you say, and maybe also compiler optimizations around slices (I’m not sure).

dherman · 2017-11-19T18:16:39Z

OK, yeah, there's no reason why we needed cslice in the first place. We use its ABI in two (morally identical) functions, Neon_Buffer_Data and Neon_ArrayBuffer_Data:

https://github.com/neon-bindings/neon/blob/master/crates/neon-runtime/src/neon.cc#L202-L206

and instead of a single out-pointer for a pair, we can simply pass two out-pointers for the two fields, as stack-local variables:

https://github.com/neon-bindings/neon/blob/master/src/js/binary.rs#L42-L47

And then we can call std::slice::from_raw_parts to construct the Rust slice.

…ce` and `CMutSlice`

…iew" is the generic name used in the specs and docs for typed array types) in favor of `ArrayBufferData`, which is the name used in the actual JS spec

dherman · 2017-12-29T07:40:14Z

@matklad I've updated this RFC according to our discussion. I feel good about the state of it, so I'll probably tag it with "final comment period" soon.

kjvalencik · 2018-05-18T11:23:46Z

text/0000-array-buffer-views.md

+
+While it requires `unsafe` code, this design allows users to define their own `ViewTypes` for compound types such as tuples or structs.
+
+## Convenience methods


Is this a typical pattern in Rust? I haven't seen it before. What's the benefit over super fish, just simpler syntax for new users?

let a = but.as_u32_slice()[0]; let b = but.as_slice::<u32>()[1];

Yeah I figured turbofish is nice for people who are used to it, but in my experience it's been confusing to teach to new Rust programmers. So I figured the conveniences would be nice for tutorials, teaching material, slide decks, etc.

dherman · 2018-05-31T06:36:07Z

OK, one more question for anyone who may have an opinion: ArrayBufferData is not the best name for the data structure that represents the backing data, because we should use the same type for the backing data of both JsArrayBuffer and JsBuffer. Names I can think of:

BinaryData
BufferData
BackingData -- too generic
BackingBuffer -- confusing because we already have two types that "are buffers"
BinaryBuffer -- ditto
ByteBuffer -- ditto
ByteStore -- confusing with "app store"
BinaryStore -- ditto

I think the first two are my favorites. The latter is nice but it could confuse people into thinking it's only for JsBuffer. BinaryData has slightly less of a hint that it's related to buffers, but being the associated type for borrowing the contents of JsArrayBuffer and JsBuffer provides the association.

So I'm leaning towards BinaryData. Anyone want to agree or disagree?

…wType` so they make sense for both `JsArrayBuffer` and `JsBuffer`

dherman · 2018-05-31T06:51:29Z

Pushed the name change.

dherman added 3 commits November 7, 2017 22:25

WIP: RFC: ArrayBuffer Views

237c8d7

separate as_slice vs as_mut_slice; lots more documentation; added…

3ce0a7c

… drawbacks and alternatives sections

label byte offset 16 as well in the binary layout diagram

adbcdaf

dherman added 2 commits December 28, 2017 23:33

ArrayBufferView methods produce native Rust slices instead of `CSli…

2ae7ca7

…ce` and `CMutSlice`

rename ArrayBufferView type (which is confusing since "ArrayBufferV…

0a255bd

…iew" is the generic name used in the specs and docs for typed array types) in favor of `ArrayBufferData`, which is the name used in the actual JS spec

dherman added the final comment period last opportunity to discuss the proposed conclusion label Mar 22, 2018

kjvalencik reviewed May 18, 2018

View reviewed changes

rename ArrayBufferData to BinaryData and ViewType to `BinaryVie…

ce847c6

…wType` so they make sense for both `JsArrayBuffer` and `JsBuffer`

dherman changed the title ~~WIP: RFC: ArrayBuffer Views~~ RFC: ArrayBuffer Views Jun 1, 2018

dherman merged commit b2f6eda into neon-bindings:master Jun 1, 2018

dherman deleted the array-buffer-views branch June 1, 2018 00:05

dherman mentioned this pull request Jun 1, 2018

Tracking issue for ArrayBuffer Views (RFC 5) neon-bindings/neon#313

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: ArrayBuffer Views #5

RFC: ArrayBuffer Views #5

dherman commented Nov 8, 2017 •

edited

Loading

matklad commented Nov 11, 2017

dherman commented Nov 15, 2017 •

edited

Loading

nixpulvis commented Nov 16, 2017 •

edited

Loading

dherman commented Nov 17, 2017

dherman commented Nov 17, 2017

dherman commented Nov 17, 2017

matklad commented Nov 17, 2017 •

edited

Loading

dherman commented Nov 17, 2017

matklad commented Nov 17, 2017

dherman commented Nov 17, 2017 •

edited

Loading

dherman commented Nov 17, 2017

dherman commented Nov 17, 2017

matklad commented Nov 17, 2017

dherman commented Nov 17, 2017

matklad commented Nov 17, 2017

matklad commented Nov 17, 2017

dherman commented Nov 17, 2017

dherman commented Nov 17, 2017 •

edited

Loading

dherman commented Nov 19, 2017

dherman commented Dec 29, 2017

kjvalencik May 18, 2018

dherman May 19, 2018

dherman commented May 31, 2018 •

edited

Loading

dherman commented May 31, 2018


		While it requires `unsafe` code, this design allows users to define their own `ViewTypes` for compound types such as tuples or structs.

		## Convenience methods

RFC: ArrayBuffer Views #5

RFC: ArrayBuffer Views #5

Conversation

dherman commented Nov 8, 2017 • edited Loading

matklad commented Nov 11, 2017

dherman commented Nov 15, 2017 • edited Loading

nixpulvis commented Nov 16, 2017 • edited Loading

dherman commented Nov 17, 2017

dherman commented Nov 17, 2017

dherman commented Nov 17, 2017

matklad commented Nov 17, 2017 • edited Loading

dherman commented Nov 17, 2017

matklad commented Nov 17, 2017

dherman commented Nov 17, 2017 • edited Loading

dherman commented Nov 17, 2017

dherman commented Nov 17, 2017

matklad commented Nov 17, 2017

dherman commented Nov 17, 2017

matklad commented Nov 17, 2017

matklad commented Nov 17, 2017

dherman commented Nov 17, 2017

dherman commented Nov 17, 2017 • edited Loading

dherman commented Nov 19, 2017

dherman commented Dec 29, 2017

kjvalencik May 18, 2018

Choose a reason for hiding this comment

dherman May 19, 2018

Choose a reason for hiding this comment

dherman commented May 31, 2018 • edited Loading

dherman commented May 31, 2018

dherman commented Nov 8, 2017 •

edited

Loading

dherman commented Nov 15, 2017 •

edited

Loading

nixpulvis commented Nov 16, 2017 •

edited

Loading

matklad commented Nov 17, 2017 •

edited

Loading

dherman commented Nov 17, 2017 •

edited

Loading

dherman commented Nov 17, 2017 •

edited

Loading

dherman commented May 31, 2018 •

edited

Loading