Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplex stream support for separate readableHighWaterMark and writableHighWaterMark #14555

Closed
guymguym opened this issue Jul 31, 2017 · 8 comments
Labels
feature request Issues that request new features to be added to Node.js. stream Issues and PRs related to the stream subsystem.

Comments

@guymguym
Copy link
Contributor

  • Version: 8.2.1
  • Platform: Darwin
  • Subsystem: stream

This need comes from the practice to use transform streams to perform pipeline processing with backpressure, in order to process incoming binary data, by splitting it to intermediate chunk objects. In such cases the transform stream will accept buffers as input (readableObjectMode: true) and push out objects (writableObjectMode: true), or vice versa.

While the objectMode flag supports separation between readable and writable, the highWaterMark option is unified between the stream roles, which doesn't allow to set the internal buffer size units with respect to the stream type. It seems that optional support for readableHighWaterMark and writableHighWaterMark is a natural complementary option to the separated readableObjectMode and writableObjectMode options.

If PR's are welcome I can code the same handling for in the ctors of Readable and Writable and add to docs.

LMK what you think,
Thanks!

References:
https://nodejs.org/en/docs/guides/backpressuring-in-streams/
https://github.com/nodejs/node/blob/master/lib/_stream_readable.js#L69

@mscdex mscdex added feature request Issues that request new features to be added to Node.js. stream Issues and PRs related to the stream subsystem. labels Jul 31, 2017
@Fishrock123
Copy link
Contributor

@guymguym could you describe a bit why it may be desirable to have these separately?

cc @nodejs/streams

@addaleax
Copy link
Member

I’m actually a bit surprised this didn’t already exist, so 👍 from me.

Also, just for clarity, this should be implemented on the Duplex level rather than on Transform.

@Fishrock123 One problem I could imagine is that high/low water marks can’t really ever have an identical and meaningful value if one side is in object mode and the other isn’t.

@guymguym
Copy link
Contributor Author

@Fishrock123 this occurs anytime you mix between object modes on the same transform stream, in that case the highWaterMark units is not suitable to measure the buffer size for one of the modes.

Here's an example to illustrate the issue:

fs.createReadStream('large-file-with-json-per-line')
  .pipe(new stream.Transform({
    readableObjectMode: false, // reading in buffers
    writableObjectMode: true, // writing out json decoded objects
    highWaterMark: ??? // bytes or objects???
    transform(buf, enc, callback) {
      // split pending data to lines ...
      lines.forEach(line => this.push(JSON.parse(line)));
      callback();
    }
  })
  .pipe(new stream.Transform({
    objectMode: true,
    transform(obj, enc, callback) {
      rest_api(obj).then(reply => callback(null, reply), err => callback(err));
    }
  })
  .pipe(new stream.Transform({
    readableObjectMode: true, // reading in objects
    writableObjectMode: false, // writing out buffers
    highWaterMark: ??? // bytes or objects???
    transform(obj, enc, callback) {
      callback(null, JSON.stringify(obj) + '\n');
    }
  })
  .pipe(fs.createWriteStream('output-file'));

@guymguym
Copy link
Contributor Author

@addaleax You are right of course, this behavior is inherited from Duplex.

However it makes less sense for Duplex streams compared to Transform, so I just chose to describe it from the usage point of view rather from the impl pov.

This is currently implemented inside the ReadableState and WritableState ctors:
https://github.com/nodejs/node/blob/master/lib/_stream_readable.js#L69
https://github.com/nodejs/node/blob/master/lib/_stream_writable.js#L49

@guymguym
Copy link
Contributor Author

When I think about it, this is also relevant for transform streams of the same objectMode, as the input and output might still have different proportions. For example think of a transformer that pushes two lines for every input line...

@Fishrock123 Fishrock123 changed the title Transform stream support for separate readableHighWaterMark and writableHighWaterMark Duplex stream support for separate readableHighWaterMark and writableHighWaterMark Jul 31, 2017
@mcollina
Copy link
Member

mcollina commented Aug 1, 2017

I'm definitely 👍 on this. Would you like to assemble a PR?

@guymguym
Copy link
Contributor Author

guymguym commented Aug 2, 2017

sure. will do.

@guymguym
Copy link
Contributor Author

guymguym commented Aug 4, 2017

PR is up. It's my first PR for nodejs so let me know if I missed any guidelines, or you prefer different code for any reason, and I would be happy to make changes.

guymguym added a commit to guymguym/node that referenced this issue Aug 7, 2017
addaleax pushed a commit that referenced this issue Aug 10, 2017
This commits adds support for readableHighWaterMark and
writableHighWaterMark in Duplex stream, so that they can be set without
accessing the internal state.

Fixes: #14555
PR-URL: #14636
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants