-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src: implement DataQueue and non-memory resident Blob #45258
Conversation
@mcollina ... this is ready for review! |
cc @KhafraDev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
9b037f1
to
18ba98e
Compare
This comment was marked as outdated.
This comment was marked as outdated.
Moving this back to Draft status. Trying to work through @addaleax's feedback and wanted to make it so that the |
Would you mind to backporting this to v18.x? |
See documentation in dataqueue/queue.h for details Co-authored-by: flakey5 <[email protected]> PR-URL: #45258 Reviewed-By: Matteo Collina <[email protected]>
Co-authored-by: flakey5 <[email protected]> PR-URL: #45258 Reviewed-By: Matteo Collina <[email protected]>
Co-authored-by: flakey5 <[email protected]> PR-URL: #45258 Reviewed-By: Matteo Collina <[email protected]>
Notable changes: buffer: * (SEMVER-MINOR) add Buffer.copyBytesFrom(...) (James M Snell) #46500 doc: * add marco-ippolito to collaborators (Marco Ippolito) #46816 events: * (SEMVER-MINOR) add listener argument to listenerCount (Paolo Insogna) #46523 lib: * (SEMVER-MINOR) add AsyncLocalStorage.bind() and .snapshot() (flakey5) #46387 src: * (SEMVER-MINOR) add `fs.openAsBlob` to support File-backed Blobs (James M Snell) #45258 tls: * (SEMVER-MINOR) support automatic DHE (Tobias Nießen) #46978 url: * (SEMVER-MINOR) implement URLSearchParams size getter (James M Snell) #46308 wasi: * (SEMVER-MINOR) add support for version when creating WASI (Michael Dawson) #46469 worker: * (SEMVER-MINOR) add support for worker name in inspector and trace_events (Debadree Chatterjee) #46832 PR-URL: #47086
Notable changes: buffer: * (SEMVER-MINOR) add Buffer.copyBytesFrom(...) (James M Snell) #46500 doc: * add marco-ippolito to collaborators (Marco Ippolito) #46816 events: * (SEMVER-MINOR) add listener argument to listenerCount (Paolo Insogna) #46523 lib: * (SEMVER-MINOR) add AsyncLocalStorage.bind() and .snapshot() (flakey5) #46387 src: * (SEMVER-MINOR) add `fs.openAsBlob` to support File-backed Blobs (James M Snell) #45258 tls: * (SEMVER-MINOR) support automatic DHE (Tobias Nießen) #46978 url: * (SEMVER-MINOR) implement URLSearchParams size getter (James M Snell) #46308 wasi: * (SEMVER-MINOR) add support for version when creating WASI (Michael Dawson) #46469 worker: * (SEMVER-MINOR) add support for worker name in inspector and trace_events (Debadree Chatterjee) #46832 PR-URL: #47087
Notable changes: buffer: * (SEMVER-MINOR) add Buffer.copyBytesFrom(...) (James M Snell) #46500 doc: * add marco-ippolito to collaborators (Marco Ippolito) #46816 events: * (SEMVER-MINOR) add listener argument to listenerCount (Paolo Insogna) #46523 lib: * (SEMVER-MINOR) add AsyncLocalStorage.bind() and .snapshot() (flakey5) #46387 src: * (SEMVER-MINOR) add `fs.openAsBlob` to support File-backed Blobs (James M Snell) #45258 tls: * (SEMVER-MINOR) support automatic DHE (Tobias Nießen) #46978 url: * (SEMVER-MINOR) implement URLSearchParams size getter (James M Snell) #46308 wasi: * (SEMVER-MINOR) add support for version when creating WASI (Michael Dawson) #46469 worker: * (SEMVER-MINOR) add support for worker name in inspector and trace_events (Debadree Chatterjee) #46832 PR-URL: #47087
Notable changes: buffer: * (SEMVER-MINOR) add Buffer.copyBytesFrom(...) (James M Snell) #46500 doc: * add marco-ippolito to collaborators (Marco Ippolito) #46816 events: * (SEMVER-MINOR) add listener argument to listenerCount (Paolo Insogna) #46523 lib: * (SEMVER-MINOR) add AsyncLocalStorage.bind() and .snapshot() (flakey5) #46387 src: * (SEMVER-MINOR) add `fs.openAsBlob` to support File-backed Blobs (James M Snell) #45258 tls: * (SEMVER-MINOR) support automatic DHE (Tobias Nießen) #46978 url: * (SEMVER-MINOR) implement URLSearchParams size getter (James M Snell) #46308 wasi: * (SEMVER-MINOR) add support for version when creating WASI (Michael Dawson) #46469 worker: * (SEMVER-MINOR) add support for worker name in inspector and trace_events (Debadree Chatterjee) #46832 PR-URL: #47087
I notice that blobs transfered over workers aren't readable at all.
mini repro: import { Worker } from 'worker_threads'
new Worker('./worker.js').once('message', blob => blob.text()) // worker.js
import { parentPort } from 'node:worker_threads'
import { openAsBlob } from 'node:fs'
const blob = await openAsBlob(import.meta.url.slice(7)) // file:// doesn't work
parentPort.postMessage(blob) I would assume that if you also closed the worker it would still work just fine to read the blob, even if it came from another thread... Error stacktrace
|
Oops, yeah just a bug. Will take a look in the next day or so. |
I also noticed that the new method is on |
* chore: upgrade to Node.js v20 * src: allow embedders to override NODE_MODULE_VERSION nodejs/node#49279 * src: fix missing trailing , nodejs/node#46909 * src,tools: initialize cppgc nodejs/node#45704 * tools: allow passing absolute path of config.gypi in js2c nodejs/node#49162 * tools: port js2c.py to C++ nodejs/node#46997 * doc,lib: disambiguate the old term, NativeModule nodejs/node#45673 * chore: fixup Node.js BSSL tests * nodejs/node#49492 * nodejs/node#44498 * deps: upgrade to libuv 1.45.0 nodejs/node#48078 * deps: update V8 to 10.7 nodejs/node#44741 * test: use gcUntil() in test-v8-serialize-leak nodejs/node#49168 * module: make CJS load from ESM loader nodejs/node#47999 * src: make BuiltinLoader threadsafe and non-global nodejs/node#45942 * chore: address changes to CJS/ESM loading * module: make CJS load from ESM loader (nodejs/node#47999) * lib: improve esm resolve performance (nodejs/node#46652) * bootstrap: optimize modules loaded in the built-in snapshot nodejs/node#45849 * test: mark test-runner-output as flaky nodejs/node#49854 * lib: lazy-load deps in modules/run_main.js nodejs/node#45849 * url: use private properties for brand check nodejs/node#46904 * test: refactor `test-node-output-errors` nodejs/node#48992 * assert: deprecate callTracker nodejs/node#47740 * src: cast v8::Object::GetInternalField() return value to v8::Value nodejs/node#48943 * test: adapt test-v8-stats for V8 update nodejs/node#45230 * tls: ensure TLS Sockets are closed if the underlying wrap closes nodejs/node#49327 * test: deflake test-tls-socket-close nodejs/node#49575 * net: fix crash due to simultaneous close/shutdown on JS Stream Sockets nodejs/node#49400 * net: use asserts in JS Socket Stream to catch races in future nodejs/node#49400 * lib: fix BroadcastChannel initialization location nodejs/node#46864 * src: create BaseObject with node::Realm nodejs/node#44348 * src: implement DataQueue and non-memory resident Blob nodejs/node#45258 * sea: add support for V8 bytecode-only caching nodejs/node#48191 * chore: fixup patch indices * gyp: put filenames in variables nodejs/node#46965 * build: modify js2c.py into GN executable * fix: (WIP) handle string replacement of fs -> original-fs * [v20.x] backport vm-related memory fixes nodejs/node#49874 * src: make BuiltinLoader threadsafe and non-global nodejs/node#45942 * src: avoid copying string in fs_permission nodejs/node#47746 * look upon my works ye mighty and dispair * chore: patch cleanup * [api] Remove AllCan Read/Write https://chromium-review.googlesource.com/c/v8/v8/+/5006387 * fix: missing include for NODE_EXTERN * chore: fixup patch indices * fix: fail properly when js2c fails in Node.js * build: fix js2c root_gen_dir * fix: lib/fs.js -> lib/original-fs.js * build: fix original-fs file xforms * fixup! module: make CJS load from ESM loader * build: get rid of CppHeap for now * build: add patch to prevent extra fs lookup on esm load * build: greatly simplify js2c modifications Moves our original-fs modifications back into a super simple python script action, wires up the output of that action into our call to js2c * chore: update to handle moved internal/modules/helpers file * test: update @types/node test * feat: enable preventing cppgc heap creation * feat: optionally prevent calling V8::EnableWebAssemblyTrapHandler * fix: no cppgc initialization in the renderer * gyp: put filenames in variables nodejs/node#46965 * test: disable single executable tests * fix: nan tests failing on node headers missing file * tls,http2: send fatal alert on ALPN mismatch nodejs/node#44031 * test: disable snapshot tests * nodejs/node#47887 * nodejs/node#49684 * nodejs/node#44193 * build: use deps/v8 for v8/tools Node.js hard depends on these in their builtins * test: fix edge snapshot stack traces nodejs/node#49659 * build: remove js2c //base dep * build: use electron_js2c_toolchain to build node_js2c * fix: don't create SafeSet outside packageResolve Fixes failure in parallel/test-require-delete-array-iterator: === release test-require-delete-array-iterator === Path: parallel/test-require-delete-array-iterator node:internal/per_context/primordials:426 constructor(i) { super(i); } // eslint-disable-line no-useless-constructor ^ TypeError: object is not iterable (cannot read property Symbol(Symbol.iterator)) at new Set (<anonymous>) at new SafeSet (node:internal/per_context/primordials:426:22) * fix: failing crashReporter tests on Linux These were failing because our change from node::InitializeNodeWithArgs to node::InitializeOncePerProcess meant that we now inadvertently called PlatformInit, which reset signal handling. This meant that our intentional crash function ElectronBindings::Crash no longer worked and the renderer process no longer crashed when process.crash() was called. We don't want to use Node.js' default signal handling in the renderer process, so we disable it by passing kNoDefaultSignalHandling to node::InitializeOncePerProcess. * build: only create cppgc heap on non-32 bit platforms * chore: clean up util:CompileAndCall * src: fix compatility with upcoming V8 12.1 APIs nodejs/node#50709 * fix: use thread_local BuiltinLoader * chore: fixup v8 patch indices --------- Co-authored-by: Keeley Hammond <[email protected]> Co-authored-by: Samuel Attard <[email protected]>
* chore: upgrade to Node.js v20 * src: allow embedders to override NODE_MODULE_VERSION nodejs/node#49279 * src: fix missing trailing , nodejs/node#46909 * src,tools: initialize cppgc nodejs/node#45704 * tools: allow passing absolute path of config.gypi in js2c nodejs/node#49162 * tools: port js2c.py to C++ nodejs/node#46997 * doc,lib: disambiguate the old term, NativeModule nodejs/node#45673 * chore: fixup Node.js BSSL tests * nodejs/node#49492 * nodejs/node#44498 * deps: upgrade to libuv 1.45.0 nodejs/node#48078 * deps: update V8 to 10.7 nodejs/node#44741 * test: use gcUntil() in test-v8-serialize-leak nodejs/node#49168 * module: make CJS load from ESM loader nodejs/node#47999 * src: make BuiltinLoader threadsafe and non-global nodejs/node#45942 * chore: address changes to CJS/ESM loading * module: make CJS load from ESM loader (nodejs/node#47999) * lib: improve esm resolve performance (nodejs/node#46652) * bootstrap: optimize modules loaded in the built-in snapshot nodejs/node#45849 * test: mark test-runner-output as flaky nodejs/node#49854 * lib: lazy-load deps in modules/run_main.js nodejs/node#45849 * url: use private properties for brand check nodejs/node#46904 * test: refactor `test-node-output-errors` nodejs/node#48992 * assert: deprecate callTracker nodejs/node#47740 * src: cast v8::Object::GetInternalField() return value to v8::Value nodejs/node#48943 * test: adapt test-v8-stats for V8 update nodejs/node#45230 * tls: ensure TLS Sockets are closed if the underlying wrap closes nodejs/node#49327 * test: deflake test-tls-socket-close nodejs/node#49575 * net: fix crash due to simultaneous close/shutdown on JS Stream Sockets nodejs/node#49400 * net: use asserts in JS Socket Stream to catch races in future nodejs/node#49400 * lib: fix BroadcastChannel initialization location nodejs/node#46864 * src: create BaseObject with node::Realm nodejs/node#44348 * src: implement DataQueue and non-memory resident Blob nodejs/node#45258 * sea: add support for V8 bytecode-only caching nodejs/node#48191 * chore: fixup patch indices * gyp: put filenames in variables nodejs/node#46965 * build: modify js2c.py into GN executable * fix: (WIP) handle string replacement of fs -> original-fs * [v20.x] backport vm-related memory fixes nodejs/node#49874 * src: make BuiltinLoader threadsafe and non-global nodejs/node#45942 * src: avoid copying string in fs_permission nodejs/node#47746 * look upon my works ye mighty and dispair * chore: patch cleanup * [api] Remove AllCan Read/Write https://chromium-review.googlesource.com/c/v8/v8/+/5006387 * fix: missing include for NODE_EXTERN * chore: fixup patch indices * fix: fail properly when js2c fails in Node.js * build: fix js2c root_gen_dir * fix: lib/fs.js -> lib/original-fs.js * build: fix original-fs file xforms * fixup! module: make CJS load from ESM loader * build: get rid of CppHeap for now * build: add patch to prevent extra fs lookup on esm load * build: greatly simplify js2c modifications Moves our original-fs modifications back into a super simple python script action, wires up the output of that action into our call to js2c * chore: update to handle moved internal/modules/helpers file * test: update @types/node test * feat: enable preventing cppgc heap creation * feat: optionally prevent calling V8::EnableWebAssemblyTrapHandler * fix: no cppgc initialization in the renderer * gyp: put filenames in variables nodejs/node#46965 * test: disable single executable tests * fix: nan tests failing on node headers missing file * tls,http2: send fatal alert on ALPN mismatch nodejs/node#44031 * test: disable snapshot tests * nodejs/node#47887 * nodejs/node#49684 * nodejs/node#44193 * build: use deps/v8 for v8/tools Node.js hard depends on these in their builtins * test: fix edge snapshot stack traces nodejs/node#49659 * build: remove js2c //base dep * build: use electron_js2c_toolchain to build node_js2c * fix: don't create SafeSet outside packageResolve Fixes failure in parallel/test-require-delete-array-iterator: === release test-require-delete-array-iterator === Path: parallel/test-require-delete-array-iterator node:internal/per_context/primordials:426 constructor(i) { super(i); } // eslint-disable-line no-useless-constructor ^ TypeError: object is not iterable (cannot read property Symbol(Symbol.iterator)) at new Set (<anonymous>) at new SafeSet (node:internal/per_context/primordials:426:22) * fix: failing crashReporter tests on Linux These were failing because our change from node::InitializeNodeWithArgs to node::InitializeOncePerProcess meant that we now inadvertently called PlatformInit, which reset signal handling. This meant that our intentional crash function ElectronBindings::Crash no longer worked and the renderer process no longer crashed when process.crash() was called. We don't want to use Node.js' default signal handling in the renderer process, so we disable it by passing kNoDefaultSignalHandling to node::InitializeOncePerProcess. * build: only create cppgc heap on non-32 bit platforms * chore: clean up util:CompileAndCall * src: fix compatility with upcoming V8 12.1 APIs nodejs/node#50709 * fix: use thread_local BuiltinLoader * chore: fixup v8 patch indices --------- Co-authored-by: Keeley Hammond <[email protected]> Co-authored-by: Samuel Attard <[email protected]>
This is
the start ofworkbeingcollaboratively done by myself and @flakey5 to update the implementation ofBlob
to support non-memory resident data sources such as files as well as enable proper streaming support.It is a work in progress and remain draft status until it is ready to go. Opening it for transparency.Some background:
Currently,
Blob
objects consist solely of memory-resident data, backed by a collection ofv8::BackingStore
instances. While this behavior is correct standard/spec defined behavior, it is extremely inefficient. To support other use cases ofBlob
, we want to be able to efficiently and effectively support non-memory resident data sources such as files. To do so will require a number of key changes on the internals of theBlob
class.Further, the current implementation of
Blob
suffers from an overly simplistic streaming model where the entireBlob
is first read into a singleArrayBuffer
before passing the data on to the stream API. Effectively meaningBlobs
are never actually streamed.While going through this evaluation, we came to realization that many of the underlying requirements for
Blob
are (surprisingly) shared by the internal data management requirements for the QUIC implementation that is in progress, and can enable a range of other use cases also, so we have decided to take an approach that addresses those multiple use cases.When complete, this PR is going to do two things:This PR does three things:
DataQueue
which acts as a sequence of memory-resident and non-memory-resident data sources that can be consumed as a single logical stream of data. This is designed to meet the needs of bothBlob
and QUIC initially (the bits relevant to QUIC will come later)Blob
implementation to useDataQueue
internally in support of the existing all-memory-resident model.Blob
that will support efficient, non-memory resident use cases.This PR is far from complete. I would strongly recommend that you hold off performing any review on it at all until after we flip the bit to mark it ready for review. We are opening this draft PR now purely for the sake of transparency so that folks can be aware of what we are working on.I also want to avoid any and all bikeshedding on the design until we at least have the updatedBlob
implementation ready to go, and many of the design decisions made will likely not be clear until that concrete case is implemented.Specifically for Blob, this PR
will enableenables:Blob
data be collected into a singleArrayBuffer
and then passed on to the stream, but data will flow into the stream as one would actually expect.Blob
. The ability to acquire aBlob
that is backed by an on-disc file. As well as the ability to use suchBlob
instances efficiently within otherBlob
instances, e.g.new Blob(['string', fileBackedBlob])
For QUIC, eventually the
DataQueue
implemented here will replace theStream::Queue
mechanism implemented in that PR.DataQueue
will not completely replace that mechanism but covers a sizable chunk of it, which should further simplify and reduce the size of that rather complex bit of work.As for other use cases of
DataQueue
... we'll address those later, but we do have a range of cases in mind (including the recent discussions around Node.js providing an efficient http static file server)./cc @mcollina (who prompted this work due to Undici and Fetch use case requirements)