Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace globalPreload #147

Closed
JakobJingleheimer opened this issue Jun 20, 2023 · 30 comments
Closed

Replace globalPreload #147

JakobJingleheimer opened this issue Jun 20, 2023 · 30 comments

Comments

@JakobJingleheimer
Copy link
Member

JakobJingleheimer commented Jun 20, 2023

This hook is extremely confusing, even for maintainers.

The replacement needs to address:

Assuming there is 1 loader worker.

Spit-balling ideas:

  • expose a message port on node:module
  • return a message port from register()

Will a message port need to be refreshed?

@giltayar,I would be very interested in your thoughts (or even a design proposal) for this, especially because you recently handled this in testdouble, so the pain is fresh 😅

@giltayar
Copy link
Contributor

The most intuitive way for me is that the register returns a port that is used to communicate with this loader (how is this done today? It can't be the same port for all the loaders, and yet they all run in the same worker). In this option, we would have a function in the loader (initialize) which will accept the port to be used to communicate.

Another variant of this option is for the register option to accept a MessageChannel (instead of creating it and returning it), which the user of register can use to communicate with the loader. The same initialize function can be used (we can even call globalPreload with the port for backward compatibility! 😁). The nice thing about this option is that if the user doesn't pass the MessageChannel to the register, then it signals that there is no need to call initialize and no need for MessagePort support.

In addition to passing a MessageChannel, the register should allow passing any data whatsoever to the initialize, e.g. the userland code can now pass process.argv to the loader to satisfy the requirement in nodejs/help#4190. And actually, the user of register can just create a MessageChannel, pass the port2 via this data (it is a transferable object) and that's it: no need for any Node.js code for ports.

To summarize the above and define what is for me the best option: register can pass any data it wants to the loader, which will be passed to the exported initialize function of the loader. Additionally, if the user of register wants to communicate with the loader, it can just create a MessageChannel and pass the port to the loader as data.

@targos
Copy link
Member

targos commented Jun 21, 2023

I like that idea! I made a quick PoC here: targos/node@c4f5e91

$ ./node --no-warnings --loader "data:text/javascript," poc.mjs
initialize loader { entryPoint: '/Users/mzasso/git/nodejs/node/poc.mjs' }
loader initialized

@mcollina
Copy link
Member

that'd be amazing! PR?

@JakobJingleheimer
Copy link
Member Author

Sweeeet!

Lemme get the existing register PR stable (I think there are 1–2 small issues with it remaining), and then let's get that in 🎉

PS haven't had time to thoroughly review Gil's ideas, but I liked where they're going.

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Jun 21, 2023

One thing to consider is that the point of the communications port isn’t just to pass data at initialization time. There are plenty of use cases where loaders need to communicate back and forth to the main thread after initialization.

Now maybe all this means is that we should rename initialize to something like onMessage, if it runs on every message, or initialize could return a function that runs on every message. But we should handle this use case in the design somehow.

Edit: I just looked at the POC and I see the intent, that you’re passing in the port and can register callbacks to it independently from us needing to provide infrastructure for them. I think it can work.

Before converting this into a full-fledged PR, can someone update the docs in the POC so that I can see what the intended UX is supposed to be? I think it would be good to workshop that before someone goes and implements it. What I think is the proposed UX seems fine to me at first glance, but having it spelled out in the instructions we would write for loader authors would help me fully understand it.

@mcollina
Copy link
Member

@targos @GeoffreyBooth how is this going? I think this is blocking a few use cases to fully support Node.js v20.

Happy to help in shipping this.

@JakobJingleheimer
Copy link
Member Author

JakobJingleheimer commented Jul 10, 2023

@targos @GeoffreyBooth how is this going? I think this is blocking a few use cases to fully support Node.js v20.

Happy to help in shipping this.

I just got back from vacation.

Before I left, there appeared to be 1 issue in nodejs/node#48439 related to sequence that I didn't have time to resolve. Picking back up this week (you're welcome to take a gander at it too).

PS It looks like a duplicate PR attempting to do the same thing was started whilst I was away. I haven't had a chance yet to look at it.

@GeoffreyBooth
Copy link
Member

PS It looks like a duplicate PR attempting to do the same thing was started whilst I was away. I haven't had a chance yet to look at it.

nodejs/node#48559

izaakschroeder added a commit to izaakschroeder/node that referenced this issue Jul 10, 2023
@izaakschroeder
Copy link

izaakschroeder commented Jul 10, 2023

@GeoffreyBooth @JakobJingleheimer I updated my PR to address this issue 😄 I followed the proposal from @giltayar and the signature for register is updated to be able to pass arbitrary data to a loader's initialize hook, including two additional tests demonstrating the capability. This change exists in its own commit in order to be able to be reviewed separately.

Note, however, that I opted to add an additional specific transferList parameter instead of attempting to automatically deeply inspect the sent data, so the signature for register is now:

register(loader: string, parentUrl?: string, data?: any, transferList?: any[]): unknown;

The return value from the initialize hook is sent back, but there is currently no affordance for this to also provide a transferList and thus ports or other shared buffers cannot currently be returned from initialize.

I did not remove globalPreload as I am not yet comfortable enough knowing which bits and pieces are safe to pull out or if a deprecation notice is preferred over simply deleting the whole thing.

@GeoffreyBooth
Copy link
Member

This change exists in its own commit in order to be able to be reviewed separately.

Thank you so much! Do you mind opening this as its own PR? And it can include/depend on the earlier one. And the earlier PR can exclude the new commit so that it can be reviewed on its own.

@izaakschroeder
Copy link

izaakschroeder commented Jul 11, 2023

Done!

PR is here: izaakschroeder/node#1 I'm not sure how I can put the PR on node's repo without the base branch also living in node's repo 🤔 Would you prefer I just target node:main instead and include the previous commit? Or?

@GeoffreyBooth
Copy link
Member

I’m not sure how I can put the PR on node’s repo

So your first PR is based on your support-nested-loader-chains branch. I suggest that you:

  1. Branch off of that branch in your repo, something like support-nested-loader-chains-and-register-args, so that the two branches both point to the same commit.
  2. Rebase the support-nested-loader-chains branch to drop the newest, register-related commit.
  3. Force-push support-nested-loader-chains to your repo. This will reset the PR to exclude the newest commit.
  4. Open a new PR with support-nested-loader-chains-and-register-args against node:main and mention in the PR description that it builds off esm: unflag Module.register and allow nested loader import() node#48559.

That’s it. This is the general procedure for splitting a large PR into smaller ones.

@GeoffreyBooth
Copy link
Member

Note, however, that I opted to add an additional specific transferList parameter instead of attempting to automatically deeply inspect the sent data, so the signature for register is now:

Why do we have both data and transferList as separate arguments? What’s the difference between them?

@izaakschroeder did you see/use @targos’ POC in #147 (comment)?

@bizob2828
Copy link

bizob2828 commented Jul 27, 2023

I'm having trouble tracking down the future of the loader hooks. So I'm not sure the right place for this feedback. Please redirect me if this is not correct.

I work at New Relic on the Node.js agent team and I'm working through an update to our loader to support ESM instrumentation. I'm finding the isolating loader to their own thread is causing a few headaches. If globalPreload is getting removed how will I be able to initialize something in the loader that is in the main thread? Is this with the new initialize or register hooks?

Also if you take a look at what import-in-the-middle had to do to solve their instrumentation needs, is this really what you want to push onto vendors like us?(New Relic, Datadog, Elastic, etc)? In pre 20 we used to be able to wrap the ESM source and proxy props now we have to parse the source with an AST parser to get export names and proxy this way? I don't know if I'm too late to the party but as someone that's not directly involved in this, the idea of running loaders in separate threads is causing more headaches for vendors instead of an easy to use API to do it. I'd rather go back to monkey patching a built in like Module and getting access to packages like this. I worry that this will soon be stable and the hook points available look great on paper but when you see what has to be done within, it is creating a different problem and risk.

@GeoffreyBooth
Copy link
Member

I'm having trouble tracking down the future of the loader hooks. So I'm not sure the right place for this feedback. Please redirect me if this is not correct.

Hi @bizob2828 and thanks for your feedback. Yes this is a good place.

If globalPreload is getting removed how will I be able to initialize something in the loader that is in the main thread?

It’s the other way around: the main thread will initialize the loader. Instead of your users running --loader new-relic/loader or whatever they do for ESM now, they would instead run --import new-relic/register. And within your register module would be code that calls module.register() to register your loader hooks and get a communications port to send messages between the main and loader threads; and then do whatever other initialization you want to do on the main thread. This approach provides easy access to the main thread, which is why we’re going this way (and may phase out --loader, as it’s not really needed anymore).

if you take a look at what import-in-the-middle had to do to solve their instrumentation needs, is this really what you want to push onto vendors like us?

Please read the whole thread at nodejs/node#47888, this was discussed in detail. The short version is that we’re not trying to make your jobs difficult, but we’re constrained by both the ESM spec and by V8’s capabilities. However we can make things easier within our power, that doesn’t break something else, we’re happy to do—or are happy to code review PRs for. As you might have noticed, there is a very small group working on these features and we have very limited time, so you might need to volunteer some of your own time to implement some of your requests.

I’m finding the isolating loader to their own thread is causing a few headaches.

I understand, but it’s necessary for the loader hooks to eventually support CommonJS, as the only way to “syncify” our hooks to allow customizing the sync require calls is by using a separate thread. Ditto for supporting the sync import.meta.resolve API. There were many other reasons as well; the short version is that this was years in the making and it’s not some minor change that we’re open to reverting. I understand it’s annoying from your perspective, but it’s necessary to achieve all the goals we’re aiming to satisfy with the Loaders API. We’re only two PRs away from getting to the end of our road map for what we need to land in order to call the API stable; so pretty soon the churn should end and you’ll be able to rely on this without so much effort.

@cspotcode
Copy link
Contributor

cspotcode commented Jul 27, 2023

In pre 20 we used to be able to wrap the ESM source and proxy props now we have to parse the source with an AST parser to get export names and proxy this way?

This is a different but related request to my ask that we get hook access into node's CommonJS static analysis to generate named exports from CJS modules. It was somewhere on that list of missing hooking APIs I shared a while back, wherever that went.

I found one place that I shared it: nodejs/node#43818 (comment) I feel like it was captured into markdown somewhere, just can't remember where.

@GeoffreyBooth
Copy link
Member

This is a different but related request to my ask that we get hook access into node’s CommonJS static analysis to generate named exports from CJS modules. It was somewhere on that list of missing hooking APIs I shared a while back, wherever that went.

That list got migrated into the Milestone 3 list: https://github.com/nodejs/loaders#milestone-3-usability-improvements. The CommonJS static analysis is on there, the one about cjs-module-lexer.

FYI while it’s not currently available from within Node, like import lexer from 'node:cjs-module-lexer' or something like that, it is available as a package that you can add as a dependency to your projects: https://github.com/nodejs/cjs-module-lexer.

izaakschroeder added a commit to izaakschroeder/node that referenced this issue Jul 31, 2023
izaakschroeder added a commit to izaakschroeder/node that referenced this issue Jul 31, 2023
@ghost
Copy link

ghost commented Jul 31, 2023

Hello Loaders team. Will this eventually end up in all version lines, or will old versions still use globalPreload? Trying to decide whether I should keep the preload logic in my loader, or plan to just drop it completely when register shows up. My idea now is to target LTS, v16+.

izaakschroeder added a commit to izaakschroeder/node that referenced this issue Aug 3, 2023
Follows @giltayar's proposed API:

> `register` can pass any data it wants to the loader, which will be
passed to the exported `initialize` function of the loader.
Additionally, if the user of `register` wants to communicate with the
loader, it can just create a `MessageChannel` and pass the port to the
loader as data.

The `register` API is now:

```ts
interface Options {
  parentUrl?: string;
  data?: any;
  transferList?: any[];
}

function register(loader: string, parentUrl?: string): any;
function register(loader: string, options?: Options): any;
```

This API is backwards compatible with the old one (new arguments are
optional and at the end) and allows for passing data into the new
`initialize` hook. If this hook returns data it is passed back to
`register`:

```ts
function initialize(data: any): Promise<any>;
```

**NOTE**: Currently there is no mechanism for a loader to exchange
ownership of something back to the caller.

Refs: nodejs/loaders#147
@giltayar
Copy link
Contributor

giltayar commented Aug 3, 2023

Question:

I'm responsible for the ESM implementation of module mocking in testdouble. In that implementation we have a file testdouble.js that implements the mocking API by communicating with the loader.

What I'd like to do, so that users won't need to --import testdouble-loader.js, is to register the loader in testdouble.js. The question (or is it a thinly-veiled ask? 😁) is this: how can I know whether I already registered the loader or not?

You could say that given that the registration happens when the testdouble.js module is loaded, and that doesn't happen twice, then calling register shouldn't happen and there's no need to check this.

True! But what happens when testdouble.js is loaded twice because it is used in two workers?

Hmm... this is actually an important question: if I have two workers that want to communicate with the loader. Is calling register allowed once per worker? And should the loader store an array of ports that it uses to communicate with all the workers that register-ed it? I believe it should.

(Sorry, a bit confused. But I think my train of thought here is important to understand the needs of API-s that need to communicate with loaders)

nodejs-github-bot pushed a commit to nodejs/node that referenced this issue Aug 3, 2023
Follows @giltayar's proposed API:

> `register` can pass any data it wants to the loader, which will be
passed to the exported `initialize` function of the loader.
Additionally, if the user of `register` wants to communicate with the
loader, it can just create a `MessageChannel` and pass the port to the
loader as data.

The `register` API is now:

```ts
interface Options {
  parentUrl?: string;
  data?: any;
  transferList?: any[];
}

function register(loader: string, parentUrl?: string): any;
function register(loader: string, options?: Options): any;
```

This API is backwards compatible with the old one (new arguments are
optional and at the end) and allows for passing data into the new
`initialize` hook. If this hook returns data it is passed back to
`register`:

```ts
function initialize(data: any): Promise<any>;
```

**NOTE**: Currently there is no mechanism for a loader to exchange
ownership of something back to the caller.

Refs: nodejs/loaders#147
PR-URL: #48842
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
pluris pushed a commit to pluris/node that referenced this issue Aug 6, 2023
Follows @giltayar's proposed API:

> `register` can pass any data it wants to the loader, which will be
passed to the exported `initialize` function of the loader.
Additionally, if the user of `register` wants to communicate with the
loader, it can just create a `MessageChannel` and pass the port to the
loader as data.

The `register` API is now:

```ts
interface Options {
  parentUrl?: string;
  data?: any;
  transferList?: any[];
}

function register(loader: string, parentUrl?: string): any;
function register(loader: string, options?: Options): any;
```

This API is backwards compatible with the old one (new arguments are
optional and at the end) and allows for passing data into the new
`initialize` hook. If this hook returns data it is passed back to
`register`:

```ts
function initialize(data: any): Promise<any>;
```

**NOTE**: Currently there is no mechanism for a loader to exchange
ownership of something back to the caller.

Refs: nodejs/loaders#147
PR-URL: nodejs#48842
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
pluris pushed a commit to pluris/node that referenced this issue Aug 7, 2023
Follows @giltayar's proposed API:

> `register` can pass any data it wants to the loader, which will be
passed to the exported `initialize` function of the loader.
Additionally, if the user of `register` wants to communicate with the
loader, it can just create a `MessageChannel` and pass the port to the
loader as data.

The `register` API is now:

```ts
interface Options {
  parentUrl?: string;
  data?: any;
  transferList?: any[];
}

function register(loader: string, parentUrl?: string): any;
function register(loader: string, options?: Options): any;
```

This API is backwards compatible with the old one (new arguments are
optional and at the end) and allows for passing data into the new
`initialize` hook. If this hook returns data it is passed back to
`register`:

```ts
function initialize(data: any): Promise<any>;
```

**NOTE**: Currently there is no mechanism for a loader to exchange
ownership of something back to the caller.

Refs: nodejs/loaders#147
PR-URL: nodejs#48842
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
Ceres6 pushed a commit to Ceres6/node that referenced this issue Aug 14, 2023
Follows @giltayar's proposed API:

> `register` can pass any data it wants to the loader, which will be
passed to the exported `initialize` function of the loader.
Additionally, if the user of `register` wants to communicate with the
loader, it can just create a `MessageChannel` and pass the port to the
loader as data.

The `register` API is now:

```ts
interface Options {
  parentUrl?: string;
  data?: any;
  transferList?: any[];
}

function register(loader: string, parentUrl?: string): any;
function register(loader: string, options?: Options): any;
```

This API is backwards compatible with the old one (new arguments are
optional and at the end) and allows for passing data into the new
`initialize` hook. If this hook returns data it is passed back to
`register`:

```ts
function initialize(data: any): Promise<any>;
```

**NOTE**: Currently there is no mechanism for a loader to exchange
ownership of something back to the caller.

Refs: nodejs/loaders#147
PR-URL: nodejs#48842
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
Ceres6 pushed a commit to Ceres6/node that referenced this issue Aug 14, 2023
Follows @giltayar's proposed API:

> `register` can pass any data it wants to the loader, which will be
passed to the exported `initialize` function of the loader.
Additionally, if the user of `register` wants to communicate with the
loader, it can just create a `MessageChannel` and pass the port to the
loader as data.

The `register` API is now:

```ts
interface Options {
  parentUrl?: string;
  data?: any;
  transferList?: any[];
}

function register(loader: string, parentUrl?: string): any;
function register(loader: string, options?: Options): any;
```

This API is backwards compatible with the old one (new arguments are
optional and at the end) and allows for passing data into the new
`initialize` hook. If this hook returns data it is passed back to
`register`:

```ts
function initialize(data: any): Promise<any>;
```

**NOTE**: Currently there is no mechanism for a loader to exchange
ownership of something back to the caller.

Refs: nodejs/loaders#147
PR-URL: nodejs#48842
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
UlisesGascon pushed a commit to UlisesGascon/node that referenced this issue Aug 14, 2023
Follows @giltayar's proposed API:

> `register` can pass any data it wants to the loader, which will be
passed to the exported `initialize` function of the loader.
Additionally, if the user of `register` wants to communicate with the
loader, it can just create a `MessageChannel` and pass the port to the
loader as data.

The `register` API is now:

```ts
interface Options {
  parentUrl?: string;
  data?: any;
  transferList?: any[];
}

function register(loader: string, parentUrl?: string): any;
function register(loader: string, options?: Options): any;
```

This API is backwards compatible with the old one (new arguments are
optional and at the end) and allows for passing data into the new
`initialize` hook. If this hook returns data it is passed back to
`register`:

```ts
function initialize(data: any): Promise<any>;
```

**NOTE**: Currently there is no mechanism for a loader to exchange
ownership of something back to the caller.

Refs: nodejs/loaders#147
PR-URL: nodejs#48842
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
RafaelGSS pushed a commit to nodejs/node that referenced this issue Aug 15, 2023
Follows @giltayar's proposed API:

> `register` can pass any data it wants to the loader, which will be
passed to the exported `initialize` function of the loader.
Additionally, if the user of `register` wants to communicate with the
loader, it can just create a `MessageChannel` and pass the port to the
loader as data.

The `register` API is now:

```ts
interface Options {
  parentUrl?: string;
  data?: any;
  transferList?: any[];
}

function register(loader: string, parentUrl?: string): any;
function register(loader: string, options?: Options): any;
```

This API is backwards compatible with the old one (new arguments are
optional and at the end) and allows for passing data into the new
`initialize` hook. If this hook returns data it is passed back to
`register`:

```ts
function initialize(data: any): Promise<any>;
```

**NOTE**: Currently there is no mechanism for a loader to exchange
ownership of something back to the caller.

Refs: nodejs/loaders#147
PR-URL: #48842
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
RafaelGSS pushed a commit to RafaelGSS/node that referenced this issue Aug 15, 2023
Follows @giltayar's proposed API:

> `register` can pass any data it wants to the loader, which will be
passed to the exported `initialize` function of the loader.
Additionally, if the user of `register` wants to communicate with the
loader, it can just create a `MessageChannel` and pass the port to the
loader as data.

The `register` API is now:

```ts
interface Options {
  parentUrl?: string;
  data?: any;
  transferList?: any[];
}

function register(loader: string, parentUrl?: string): any;
function register(loader: string, options?: Options): any;
```

This API is backwards compatible with the old one (new arguments are
optional and at the end) and allows for passing data into the new
`initialize` hook. If this hook returns data it is passed back to
`register`:

```ts
function initialize(data: any): Promise<any>;
```

**NOTE**: Currently there is no mechanism for a loader to exchange
ownership of something back to the caller.

Refs: nodejs/loaders#147
PR-URL: nodejs#48842
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
rluvaton pushed a commit to rluvaton/node that referenced this issue Aug 15, 2023
Follows @giltayar's proposed API:

> `register` can pass any data it wants to the loader, which will be
passed to the exported `initialize` function of the loader.
Additionally, if the user of `register` wants to communicate with the
loader, it can just create a `MessageChannel` and pass the port to the
loader as data.

The `register` API is now:

```ts
interface Options {
  parentUrl?: string;
  data?: any;
  transferList?: any[];
}

function register(loader: string, parentUrl?: string): any;
function register(loader: string, options?: Options): any;
```

This API is backwards compatible with the old one (new arguments are
optional and at the end) and allows for passing data into the new
`initialize` hook. If this hook returns data it is passed back to
`register`:

```ts
function initialize(data: any): Promise<any>;
```

**NOTE**: Currently there is no mechanism for a loader to exchange
ownership of something back to the caller.

Refs: nodejs/loaders#147
PR-URL: nodejs#48842
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
RafaelGSS pushed a commit to nodejs/node that referenced this issue Aug 17, 2023
Follows @giltayar's proposed API:

> `register` can pass any data it wants to the loader, which will be
passed to the exported `initialize` function of the loader.
Additionally, if the user of `register` wants to communicate with the
loader, it can just create a `MessageChannel` and pass the port to the
loader as data.

The `register` API is now:

```ts
interface Options {
  parentUrl?: string;
  data?: any;
  transferList?: any[];
}

function register(loader: string, parentUrl?: string): any;
function register(loader: string, options?: Options): any;
```

This API is backwards compatible with the old one (new arguments are
optional and at the end) and allows for passing data into the new
`initialize` hook. If this hook returns data it is passed back to
`register`:

```ts
function initialize(data: any): Promise<any>;
```

**NOTE**: Currently there is no mechanism for a loader to exchange
ownership of something back to the caller.

Refs: nodejs/loaders#147
PR-URL: #48842
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
@JakobJingleheimer
Copy link
Member Author

JakobJingleheimer commented Aug 21, 2023

The question (or is it a thinly-veiled ask? 😁) is this: how can I know whether I already registered the loader or not?

As far as what node will tell you, there's no straightforward way to determine that (ex you can't do registered.has(myLoader)). I can think of a couple heinous ways to figure it out. Where do you need to know–in import's module, in your loader, or from the app/entry-point?

True! But what happens when testdouble.js is loaded twice because it is used in two workers?

Do you mean node's ESM worker? I think there may currently be unintended/undesired behaviour making that possible, but we'll squash that.

if I have two workers that want to communicate with the loader.

What are these workers / where did they come from? Are they spawned by the entry point?

@izaakschroeder
Copy link

izaakschroeder commented Aug 21, 2023

@giltayar maybe? I can help… I built much of the new register machinery so perhaps I could try to build out an example that fits here?

testdouble-super-register.js

import {MessageChannel, Worker, isMainThread, workerData} from 'node:worker_threads';
import {register} from 'node:module';

const TestDoubleLock = Symbol();

// > True! But what happens when testdouble.js is loaded twice because it is used in two workers?
// This will prevent testdouble.js from executing `register` multiple times from other workers
// Node _should_ generally only execute a single module once but I people could get likely
// around that if they tried. You could add an additional guard on `globalThis` which is shared
// between every module on a single V8 context (~thread). 
if (isMainThread && !globalThis[TestDoubleLock]) {
  globalThis[TestDoubleLock] = true;

  const controlChannel = new MessageChannel();

  const result = register('testdouble-super-loader.js', {
    parentURL: import.meta.url,
    data: {port: controlChannel.port2},
    transferList: [controlChannel.port2], 
  });

  if (result.error) {
    throw new Error('Somehow registered testdouble twice!');
  }

  const spawnWorker = () => {
    const workerChannel = new MessageChannel(); 
    // > if I have _two_ workers that want to communicate with the loader.
    // You can pass ports around – one that goes to the loader and one to the worker.
    new Worker(import.meta.url, {
       workerData: {port: workerChannel.port1},
       transferList: [workerChannel.port1],
    });
    controlChannel.port1.postMessage(
      {type: 'NEW_WORKER', port: workerChannel.port2}, 
      [workerChannel.port2]
    );
  };
  
  // Do what you need to do.
  spawnWorker();
  spawnWorker();
  spawnWorker();
} else {
  workerData.port.postMessage({
    type: 'THIS_GOES_TO_LOADER'
  });
}

testdouble-super-loader.js

const handleMessage = (msg) => {
  switch(msg.type) {
  case 'NEW_WORKER':
    msg.port.on('message', handleMessage);
    break;
  case 'THIS_GOES_TO_LOADER':
    // YOUR LOGIC HERE FROM YOUR WORKERS
    break;
  }
}

// > The question (or is it a thinly-veiled ask? 😁) is this: how can I 
// > know whether I already registered the loader or not?
// You _CAN_ know this here; `initialize` is called each time the 
// loader is registered.
//
// *IMPORTANT*: There is currently no way for this value to be
// mucked up, but if/when `deregister` lands to remove a loader
// then we may need a `deinitialize` hook or similar to keep the
// value here correct.

let lock = 0;

const initialize = (data) => {
  if (lock > 0) {
    // Loader has already been registered!
    return {error: true};
  }
  lock++;
  data.port.on('message', handleMessage);
  return {error: false};
}

@izaakschroeder
Copy link

For what it's worth I'm also not against exposing a getLoaders() type method on node:module. I don't think it would be hard to implement and we could have it store the results of initialize across the thread bridge.

register(originalSpecifier, parentURL, data, transferList) {
  const id = ''; // TBD: Determine how to pull this out.
  this.loaders[id] = hooksProxy.makeSyncRequest('register', transferList, originalSpecifier, parentURL, data);
  return this.loaders[id];
}

deregister(id) {
  const result = hooksProxy.makeSyncRequest('deregister', undefined, id);
  delete this.loaders[id];
  return result;
}

getLoaders() {
  return Object.values(this.loaders);
}

@JakobJingleheimer
Copy link
Member Author

Would we want to expose the whole registry or would just name and position do the job? Would someone want to use the loader or just know about it?

@izaakschroeder
Copy link

Would we want to expose the whole registry or would just name and position do the job? Would someone want to use the loader or just know about it?

I'm… not sure… I guess we would probably want to define an interface for what it returns… I assume at least the id… but maybe more.

@ljharb
Copy link
Member

ljharb commented Aug 21, 2023

has() would be safer than getLoaders(), since programs might be relying on the function they pass in not being reachable by userland code (same reason browsers don't have a "get all event listeners" function)

@GeoffreyBooth
Copy link
Member

I don't want to create APIs without a clear use case. If loader authors can solve this problem themselves with a trivial amount of code, I'd leave it at that until it becomes clear that the existing solution is insufficient somehow.

targos pushed a commit to targos/node that referenced this issue Nov 11, 2023
Follows @giltayar's proposed API:

> `register` can pass any data it wants to the loader, which will be
passed to the exported `initialize` function of the loader.
Additionally, if the user of `register` wants to communicate with the
loader, it can just create a `MessageChannel` and pass the port to the
loader as data.

The `register` API is now:

```ts
interface Options {
  parentUrl?: string;
  data?: any;
  transferList?: any[];
}

function register(loader: string, parentUrl?: string): any;
function register(loader: string, options?: Options): any;
```

This API is backwards compatible with the old one (new arguments are
optional and at the end) and allows for passing data into the new
`initialize` hook. If this hook returns data it is passed back to
`register`:

```ts
function initialize(data: any): Promise<any>;
```

**NOTE**: Currently there is no mechanism for a loader to exchange
ownership of something back to the caller.

Refs: nodejs/loaders#147
PR-URL: nodejs#48842
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
targos pushed a commit to nodejs/node that referenced this issue Nov 23, 2023
Follows @giltayar's proposed API:

> `register` can pass any data it wants to the loader, which will be
passed to the exported `initialize` function of the loader.
Additionally, if the user of `register` wants to communicate with the
loader, it can just create a `MessageChannel` and pass the port to the
loader as data.

The `register` API is now:

```ts
interface Options {
  parentUrl?: string;
  data?: any;
  transferList?: any[];
}

function register(loader: string, parentUrl?: string): any;
function register(loader: string, options?: Options): any;
```

This API is backwards compatible with the old one (new arguments are
optional and at the end) and allows for passing data into the new
`initialize` hook. If this hook returns data it is passed back to
`register`:

```ts
function initialize(data: any): Promise<any>;
```

**NOTE**: Currently there is no mechanism for a loader to exchange
ownership of something back to the caller.

Refs: nodejs/loaders#147
PR-URL: #48842
Backport-PR-URL: #50669
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
sercher added a commit to sercher/graaljs that referenced this issue Apr 25, 2024
Follows @giltayar's proposed API:

> `register` can pass any data it wants to the loader, which will be
passed to the exported `initialize` function of the loader.
Additionally, if the user of `register` wants to communicate with the
loader, it can just create a `MessageChannel` and pass the port to the
loader as data.

The `register` API is now:

```ts
interface Options {
  parentUrl?: string;
  data?: any;
  transferList?: any[];
}

function register(loader: string, parentUrl?: string): any;
function register(loader: string, options?: Options): any;
```

This API is backwards compatible with the old one (new arguments are
optional and at the end) and allows for passing data into the new
`initialize` hook. If this hook returns data it is passed back to
`register`:

```ts
function initialize(data: any): Promise<any>;
```

**NOTE**: Currently there is no mechanism for a loader to exchange
ownership of something back to the caller.

Refs: nodejs/loaders#147
PR-URL: nodejs/node#48842
Backport-PR-URL: nodejs/node#50669
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
sercher added a commit to sercher/graaljs that referenced this issue Apr 25, 2024
Follows @giltayar's proposed API:

> `register` can pass any data it wants to the loader, which will be
passed to the exported `initialize` function of the loader.
Additionally, if the user of `register` wants to communicate with the
loader, it can just create a `MessageChannel` and pass the port to the
loader as data.

The `register` API is now:

```ts
interface Options {
  parentUrl?: string;
  data?: any;
  transferList?: any[];
}

function register(loader: string, parentUrl?: string): any;
function register(loader: string, options?: Options): any;
```

This API is backwards compatible with the old one (new arguments are
optional and at the end) and allows for passing data into the new
`initialize` hook. If this hook returns data it is passed back to
`register`:

```ts
function initialize(data: any): Promise<any>;
```

**NOTE**: Currently there is no mechanism for a loader to exchange
ownership of something back to the caller.

Refs: nodejs/loaders#147
PR-URL: nodejs/node#48842
Backport-PR-URL: nodejs/node#50669
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Geoffrey Booth <[email protected]>
@JakobJingleheimer
Copy link
Member Author

I think this is super obsolete, so closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants