Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ WIP ] pinole integration, next gen transports #937

Closed
wants to merge 1 commit into from

Conversation

davidmarkclements
Copy link
Member

@davidmarkclements davidmarkclements commented Dec 13, 2020

For background see #614 and https://github.com/pinojs/pinole

  • transports that run in worker threads
  • supports ESM and CJS next gen transports
  • supports "legacy" transports
  • supports flushSync

Roll out plan

  • Part of Pino v7
  • Supports Node 12 and Node 14
  • I suggest we release v7 before 10 is at end-of-life, - Node 10 users can stay on Pino 6.

Pino transport API

Package support:

  const logger = pino(pino.transport('my-transport', {some: 'options'})

File path support:

  const logger = pino(pino.transport('./path/to/my/local/transport.js', {some: 'options'})

The options object must be clonable because it's passed to the worker thread - e.g. no functions etc.

Next Gen Transports

Transports may be ESM or CJS. I'm encouraging ESM for transports.

A canonical next gen transport runs in a worker thread and looks as follows:

export default (opts) => {
  return (data, sync) => {
     // do something with data
     // useful sync boolean tells us if the main thread is trying to sync flush
  }
}

For asynchronous initialisation, using ESM you have Top level await but
for any async work based on opts you can make the exported function async:

export default async (opts) => {
  // do something awaiting with values from opts
  return (data, sync) => {
     // do something with data
     // useful sync boolean tells us if the main thread is trying to sync flush
  }
}

For asynchronous writes, make the returned writing function async:

export default async (opts) => {
  return async (data, sync) => {
     // do something with data and await
  }
}

Making the write function async is tolerable to performance constraints because this is
happening off the main thread - the ergonomics of async/await can win here.

Legacy Support

Support for existing transports works like so:

  • IF the transport is a module
  • AND IF the package.json has a "bin" field
  • AND IF NOT the package.json has a "pino" field that doesn't have the value "legacy"

Then it's determined to be legacy. This means the transports that don't work should work automatically. But they can disambiguate by adding "pino": "legacy" to the package.json.

Pinole wraps legacy transports in a next gen transport wrapper and and feeds data into process.stdin while working around the stdout/stderr disconnect race scenario you get with worker threads during the final tick of process exit: see https://github.com/pinojs/pinole/blob/master/lib/legacy.js

This means flushSync works with legacy as well.

Anything that isn't a module must be a next gen transport.

Modules that have a package.json without a "bin" field are interpreted as next gen transports
Modules that have a package.json with a "bin" field but with "pino": "v7" (or really anything that doesn't say legacy) are interpreted as next gen transports

Here's pino-coloda getting the legacy treatment from Pinole - this is pino-colada running inside a worker thread, with no modifications to pino-colada itself:

image

Approach: Performance & Safety

This approach supports flushSync by blocking the main thread using Atomics which unblocks conditionally when the worker thread updates a element in a SharedArrayBuffer. This means in the final tick (eg process.on('exit', ...)) we can still do asynchronous work in the worker thread while keeping the main thread alive.

During normal operations no Atomics are used, and we purposefully avoid blocking the main thread at all costs. We use the worker thread to poll the SharedArrayBuffer, the only work the main thread does is write into the SharedArrayBuffer (in the same way that sonic-boom writes to a buffer, using TextEncoder) and then update a meta-data section of the SharedArrayBuffer so that the worker thread knows to pass the log line(s) to the transport.

Todo

Lot's more to do but I wanted to get eyes on this sooner rather than later

  • docs
  • benchmarking (thinking req/s of an http server)
  • there's some no-op methods in pinole (flush, end, etc.) that could probably benefit from some implementation details, @mcollina advice appreciated
  • 100% testing for pinole (not part of this PR but blocking on it)
  • probably a lot of edge cases and bug fixing in pinole @mcollina it could really use your eyes as well
  • wondering if we should default to using a worker thread for normal stdout writes (depending on benchmarks) UNLESS os.cpus().length < 1 (or <2?)
  • Node 12 support (tests seems to not work on CI at least)
  • Windows support (tests seems to not work on CI at least)

@mcollina
Copy link
Member

I would err for a more incremental approach and keep the current behavior as default - do not send data to a worker thread unless asked to.

+1 for the great work.

@jsumners
Copy link
Member

I would err for a more incremental approach and keep the current behavior as default - do not send data to a worker thread unless asked to.

Agreed.

But other than that...

image

@davidmarkclements
Copy link
Member Author

I would err for a more incremental approach and keep the current behavior as default - do not send data to a worker thread unless asked to.

+1 for the great work.

This is the current approach - if no transport option is passed to Pinole it returns a SonicBoom instance

Transports always go in a worker thread - destinations are always written to in the main thread.

This has me thinking though - if there’s only 1 cpu it’s probably faster to run the transport in the main thread which we can also do

I’m also thinking of supporting multiple transports/destinations in pino next - just allow an array of transports and/or destinations to be passed as the stream option. Thoughts on that?

@jsumners
Copy link
Member

I’m also thinking of supporting multiple transports/destinations in pino next - just allow an array of transports and/or destinations to be passed as the stream option. Thoughts on that?

😬

Not sure. I'd love to deprecate pino-multi-stream. But I think perpetuating it with this will be an even bigger headache. Still, we know people are going to ask for it.

@mcollina
Copy link
Member

We should really embed pino-multi-stream if we went that route.

@davidmarkclements
Copy link
Member Author

We should really embed pino-multi-stream if we went that route.

ok let's get this done and look at that separately

@mcollina
Copy link
Member

This still needs:

  • github actions update to remove node 10 and 13
  • fixup for failing tests
  • docs

@jmealo
Copy link

jmealo commented Mar 20, 2021

@davidmarkclements this looks awesome! Where we at with this?

@mcollina
Copy link
Member

We are reworking the worker thread data transport as there were a few cases of data loss. You can follow our progress on that at https://www.npmjs.com/package/thread-stream.

@mcollina
Copy link
Member

mcollina commented Apr 6, 2021

@davidmarkclements I'm not sure we can safely implement this concept of transform as pure functions. We need streams.
I consider https://github.com/pinojs/pino-elasticsearch/blob/25b400ccc847ef1bbb2336cd8c05efa216d692dd/lib.js#L8 a good example on how to implement a transport, ideally that should work well with the next gen transports as well.

This was referenced Apr 7, 2021
@davidmarkclements
Copy link
Member Author

@mcollina and I had a call about this.

  1. Pinole is deprecated, @mcollina reimplemented the concept as a thread-stream
  2. the pure function debate is still on the table, if we can implement it on top of the streams concept Matteo has put together - the benefit is line splitting and parsing in core, in the worker thread - so that this doesn't have to be reimplemented in every transport over and over again
  3. the legacy transport wrapping concept isn't worth the hassle - the ways we would do it would touch parts of Node core that aren't really for touching. The delta on migrating most of our supported transports is small, we just have to export a stream from each transport and then it's compatible

I'm going to close this PR now, in favour of #1003 and #1004

@jsumners jsumners deleted the next-gen-transports branch November 20, 2021 18:04
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants