Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow all ArrayBufferView types as Buffers #12223

Closed
wants to merge 8 commits into from

Conversation

TimothyGu
Copy link
Member

Right now, many modules allow the use of plain Uint8Arrays in addition to Buffers. This PR begins the work to expand such support to all ArrayBufferView types (including all TypedArray types and DataView).

This work was inspired by #1826, which requests support for ArrayBuffer as well. But ArrayBuffer is treated separately from its views in V8's API and it is easier to get started with the work here.

The first two commits add the necessary infrastructure for consuming and testing all ArrayBufferView types. The last two demonstrate how to update the JS layer for it. (Specifically, using the new isArrayBufferView utility function and update usage of buf.length to buf.byteLength).

I don't intend this PR to make its way into 8.0.0.

Relevant discussions on IRC with @addaleax
2017-03-23 20:20:54 TimothyGu  So I was looking at https://github.com/nodejs/node/issues/1826
2017-03-23 20:21:23 TimothyGu  As we are trying to support Uint8Array in C++ binding it doesn't
                               look much more work to support all the other ArrayBufferView types
2017-03-23 20:21:40 TimothyGu  including all TypedArray and DataView
2017-03-23 20:21:54 addaleax   right … that was also pointed out to me in the last streams wg
                               meeting
2017-03-23 20:22:32 addaleax   I understand where it’s coming from, but so far I have wanted to
                               keep the changes pretty minimal
2017-03-23 20:23:06 addaleax   bc going from Buffer support to Uint8Array support is, for the most
                               part, a tiny change
2017-03-23 20:23:27 addaleax   if somebody (you?) wants to work on this, yay
2017-03-23 20:23:40 TimothyGu  TBH it doesn't take much of a change on the C++ side to get it
                               working
2017-03-23 20:23:46 TimothyGu  http://sprunge.us/SgET
2017-03-23 20:23:54 TimothyGu  or http://sprunge.us/SgET?diff
2017-03-23 20:25:12 TimothyGu  so I thought maybe we could just support all of them at the same
                               time
2017-03-23 20:25:25 addaleax   that’s nice, yes
...
2017-03-23 20:28:20 addaleax   re: generic ArrayBufferView support … the one reason I’ve not been
                               doing that is that my personal goal is to give users a more
                               standard way of passing in binary data, and Uint8Arrays work just
                               fine for that purpose. the other reason is that there is at least a
                               small bit of ambiguity, because interpreting e.g. a Uint16Array as
                               a Uint8Array can be done in two ways (entry-for-entry or via the...
2017-03-23 20:28:21 addaleax   ...underlying storage)
Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. crypto Issues and PRs related to the crypto subsystem. util Issues and PRs related to the built-in util module. zlib Issues and PRs related to the zlib subsystem. labels Apr 5, 2017
@mscdex
Copy link
Contributor

mscdex commented Apr 5, 2017

I agree with @addaleax about the possible ambiguity with how to retrieve the byte values (by index or by byte of the underlying storage). That makes me lean towards -1 on this change.

Do we know if there are other libraries (node or browser) who allow these other non-8-bit typed arrays to be used as 8-bit arrays? It might be good to see if there is a precedent for how we might handle it in node?

@TimothyGu
Copy link
Member Author

TimothyGu commented Apr 5, 2017

@mscdex

Web standards usually use Web IDL as an abstraction for type conversions and other common tasks potentially used by many standards. In addition to ArrayBuffer, all TypedArray types, and DataView, Web IDL specifies two additional types for abstraction of buffer sources:

Additionally, Web IDL defines the abstract operations getting a reference to the bytes held by a buffer source and getting a copy of the bytes held by a buffer source, which are what the Web Standards actually call upon to perform the task of getting the bytes. They both retrieve the byte values by underlying storage, not by index.

This effectively means that all Web APIs using ArrayBufferView or BufferSource Web IDL type, i.e. almost all Web APIs that take a generic byte buffer, support all TypedArray types.

Examples include:

At the same time, not all Web APIs accept all TypedArray types, but this always happens for a good reason. Two notable exceptions I found:

On the other hand, I'm not familiar with any examples in the Node.js ecosystem, considering Buffer has been the standard for byte buffers since Node.js' inception.

/cc @domenic

@domenic
Copy link
Contributor

domenic commented Apr 5, 2017

Two notable exceptions I found:

Another exception is in readable byte streams (e.g. fetch body streams). This is because we need not only bytes, but also an offset and a length (for reasons I'll leave out here but can explain if desired). And look, we already have a type for that: Uint8Array = { buffer: ArrayBuffer, byteOffset, byteLength }. So we are using Uint8Array for streams.

But this is basically a supporting argument: we only restrict to Uint8Array when we also find the byteOffset and byteLength into the larger ArrayBuffer to be meaningful. Otherwise, any form of binary data is acceptable. And indeed, as @TimothyGu pointed out, when accepting data in stream's BYOB-read() method, we accept any type.

On the other hand, I'm not familiar with any examples in the Node.js ecosystem, considering Buffer has been the standard for byte buffers since Node.js' inception.

Well, jsdom's upcoming v10 will accept any of the BufferSource types, or Buffer.

@mscdex mscdex added the semver-major PRs that contain breaking changes and should be released in the next major version. label Apr 5, 2017
@mscdex
Copy link
Contributor

mscdex commented Apr 5, 2017

@TimothyGu Have you compared at least the Buffer benchmarks to see what (if any) performance difference there is before and after these changes?

@TimothyGu
Copy link
Member Author

@mscdex there are no differences in zlib. I did not test crypto, but I don't expect there to be any performance differences since the code is practically identical (except for changing Uint8Array to ArrayBufferView), both in JS and in C++.

lib/crypto.js Outdated
@@ -42,7 +42,7 @@ const timingSafeEqual = binding.timingSafeEqual;
const Buffer = require('buffer').Buffer;
const stream = require('stream');
const util = require('util');
const { isUint8Array } = process.binding('util');
const { isArrayBufferView } = process.binding('util');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this equivalent to ArrayBuffer.isView()?

Copy link
Member

@addaleax addaleax Apr 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this equivalent to ArrayBuffer.isView()?

@lpinca yep, those do the same thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the standard version should be better in this case.

Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this WIP? There are a few more places in the API where currently only Uint8Arrays are supported (or at least it’s documented that way).

Anyway, I think I’m +1 on this change… @nodejs/collaborators Thoughts?

lib/crypto.js Outdated
@@ -42,7 +42,7 @@ const timingSafeEqual = binding.timingSafeEqual;
const Buffer = require('buffer').Buffer;
const stream = require('stream');
const util = require('util');
const { isUint8Array } = process.binding('util');
const { isArrayBufferView } = process.binding('util');
Copy link
Member

@addaleax addaleax Apr 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this equivalent to ArrayBuffer.isView()?

@lpinca yep, those do the same thing.

@@ -99,7 +99,7 @@ function zlibBuffer(engine, buffer, callback) {
var chunk;
while (null !== (chunk = engine.read())) {
buffers.push(chunk);
nread += chunk.length;
nread += chunk.byteLength;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunk is engine output, so it’s always a Buffer, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, but I'd rather err on the side of replacing too many Buffer.lengths rather than missing an ArrayBufferView.byteLength, since Buffer.length === Buffer.byteLength.

test/common.js Outdated
const out = [];
for (const type of arrayBufferViews) {
const { BYTES_PER_ELEMENT = 1 } = type;
if (Number.isInteger(byteLength % BYTES_PER_ELEMENT)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean byteLength % BYTES_PER_ELEMENT === 0, or Number.isInteger(byteLength / BYTES_PER_ELEMENT)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhh… that's embarrassing. Good catch.

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as it is. I think it make sense to support JS types for binary data wherever possible.

@TimothyGu
Copy link
Member Author

@addaleax

Is this WIP? There are a few more places in the API where currently only Uint8Arrays are supported (or at least it’s documented that way).

Somewhat. My intention was to incrementally update individual modules, instead of changing all the modules at the same time – sort of like how the support for Uint8Arrays was incrementally added. The crypto and zlib modules in this PR serve as examples for how it's done.

@TimothyGu
Copy link
Member Author

@domenic
Copy link
Contributor

domenic commented Apr 11, 2017

It looks to me like this PR does not take into account byteOffset. Is that correct? If so that seems pretty bad.

I think you need adapter code like https://github.com/tmpvar/jsdom/blob/28d08f58b82cc2100ad36b99cf5e8b4bbb3fc291/lib/api.js#L329-L333

@addaleax
Copy link
Member

It looks to me like this PR does not take into account byteOffset. Is that correct? If so that seems pretty bad.

@domenic Can you point to the code that you think doesn’t account for the offset? It does look like that’s handled correctly everywhere.

@domenic
Copy link
Contributor

domenic commented Apr 11, 2017

Well, for example, https://github.com/TimothyGu/node/blob/13152405dd8f248192fea558a9bc206201f884f1/lib/crypto.js does not contain the string "byteOffset" anywhere.

@addaleax
Copy link
Member

@domenic For most of the APIs that use C++, including the crypto ones, the ABV is unwrapped in https://github.com/nodejs/node/pull/12223/files#diff-772f489c7d0a32de3badbfbcb5fd200dR441. That does add the byteOffset to the pointer that actually ends up used (both with and without this PR), so I see no reason for concern here.

@domenic
Copy link
Contributor

domenic commented Apr 11, 2017

Great, that is what I was missing. Thank you!

@TimothyGu
Copy link
Member Author

Landed in ec53921...2ced07c.

@TimothyGu TimothyGu closed this Apr 12, 2017
@TimothyGu TimothyGu deleted the typedarray branch April 12, 2017 17:09
TimothyGu added a commit that referenced this pull request Apr 12, 2017
PR-URL: #12223
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
TimothyGu added a commit that referenced this pull request Apr 12, 2017
PR-URL: #12223
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
TimothyGu added a commit that referenced this pull request Apr 12, 2017
PR-URL: #12223
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
TimothyGu added a commit that referenced this pull request Apr 12, 2017
PR-URL: #12223
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
@jasnell jasnell mentioned this pull request May 11, 2017
tniessen added a commit to tniessen/node that referenced this pull request Feb 8, 2018
@tniessen tniessen mentioned this pull request Feb 8, 2018
2 tasks
tniessen added a commit that referenced this pull request Feb 9, 2018
PR-URL: #18651
Refs: #12223
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
MylesBorins pushed a commit that referenced this pull request Feb 21, 2018
PR-URL: #18651
Refs: #12223
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
MylesBorins pushed a commit that referenced this pull request Feb 21, 2018
PR-URL: #18651
Refs: #12223
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
MylesBorins pushed a commit that referenced this pull request Feb 21, 2018
PR-URL: #18651
Refs: #12223
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
MylesBorins pushed a commit that referenced this pull request Feb 21, 2018
PR-URL: #18651
Refs: #12223
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
gibfahn pushed a commit that referenced this pull request Apr 13, 2018
PR-URL: #18651
Refs: #12223
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
gibfahn pushed a commit that referenced this pull request Apr 13, 2018
PR-URL: #18651
Refs: #12223
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
MayaLekova pushed a commit to MayaLekova/node that referenced this pull request May 8, 2018
PR-URL: nodejs#18651
Refs: nodejs#12223
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. crypto Issues and PRs related to the crypto subsystem. semver-major PRs that contain breaking changes and should be released in the next major version. util Issues and PRs related to the built-in util module. zlib Issues and PRs related to the zlib subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants