-
Notifications
You must be signed in to change notification settings - Fork 66
proposal: Staged require() with lifecycle hooks #56
Conversation
I wonder if a simpler approach of only a single hook would be better, but with Module offering APIs that can be used to implement that hook - APIs like _resolveLookupPaths() (but doced). The 3-part API reminds me a bit of the mid-layer anti-pattern: https://lwn.net/Articles/336262/ Its very modelled on current behaviour, but if I wanted to implement a resolve like that used in https://github.com/zeit/pkg, I might want to completely replace all 3 stages with a different implementation, and would want just one hook, and then have a library of useful methods (resolve, load, parse, rewrite, ...) that I can use if I want, but no have the assumption that those 3 stages always occur baked into the extension hook API. ^--- not a rejection, just raising the possibility of a different approach that might be worth considering cc: @watson @nodejs/diagnostics |
### Safety | ||
|
||
Monkey-patching can be unsafe for many reasons. Sometimes code the behaviour | ||
of code changes based on parameter length of input functions--express is a good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know what you are trying to say about n-arity of functions, but the grammar is a bit garbled here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll work on it. Thanks for the input. 😅
|
||
A big issue on the horizon is that, if ES Modules are adopted, modules become | ||
immutable at build time. This means that the current monkey-patching approach | ||
will not work anymore. TC39 has already suggested using AST transformation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have a link to this discussion, or other ref?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It came up in one-on-one discussion with @bmeck at one point. Not sure if that is doc'd somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
### `require.resolvers` | ||
|
||
It is the responsibility of `require.resolvers` extension handlers to receive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``` | ||
|
||
By defining named behaviour specific to `require()`, we open the door for | ||
supporting the different behaviour of `import` in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe should be an array of resolvers, where if a resolver fails to resolve, the next one is tried?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, the API only makes sense if in the future there will be a key that is not "require", which is a bit YAGNI, unless you actually demonstrate that the API will work for a future use-case ("import" is what I assume you are leaving the door open for)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea on the failover design. 👍
Import is indeed what I'm thinking of with the named list design. That one is much more of a "maybe" to me than the other bits right now. I feel like there's something in here that ES Modules can benefit from in regards to the loader, but I'm not quite sure what yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go for simpler and more generic unless we are certain this will be helpful for imports, and actually, as much as es mods are a WIP, its probably not worth doing this unless we are certain it works for them, or its guaranteed to be "legacy" in a couple years.
And isn't it 2am in Vancouver? ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost 3AM. 😅
Also, my thinking was to put the import stuff in there just to spark some discussion around that. I'm definitely not tied to that part being in the final proposal. I'd really love to hear from other people more deeply involved in the ES Modules discussion than I, since both things are deeply tied to the module loading systems and I'd really like to get some coordination going on here.
'exports = module.exports =' | ||
) | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will c++ addons go through the loader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. My thinking was the single line extensions handler currently used would become a compiler and it'd skip the loader step. Not sure if that makes sense though.
require.compilers['.yaml'] = function yamlCompiler(request) { | ||
request.module.exports = yaml.parse(request.contents) | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also seems like it should be a stack. If the code was .ts, it might need compilation to .js, but after it's compiled to js, an APM that instruments js would want to get it so it can be rewritten for instrumentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my original PR went with a stacked design. I've been going back and forth and if that's much better than just wrapping the existing function repeatedly. It's a trade off of making all requires a bit slower with stack checks/iteration or just making patched environments slightly slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be possible to optimize inside module. If the array has only one member, the function used to iterate the array can be replaced directly with the only function in the array. For arrays longer than one, we might even be able to pre-compose a function out of the members of the array (basically, cacheing them to avoid repeated iteration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely feel like the stacked interface is friendlier. Maybe I'll put that in tomorrow and have a couple alternative suggestions to discuss and narrow down on.
Yep, I had similar thinking. Hadn't thought yet about how to encode it into the proposal, but basically my thinking was just that the three-stage thing would be what the internals does, but higher levels could be wrapped too, if that's more suitable. |
It might be clearer to remove refs to I am generally warming to the idea of exposing hooks into the module require/import system, but also worried a bit about tying the hooks too closely to what we do today, which may be counter productive if the intention is to enable extensibility into future innovative use cases. |
@igorklopov would a proposal such as this help with |
Ok, I rewrote most of the doc and added an alternative implementation. (I kind of prefer the alternative, to be honest) |
Still looks to me like the mid-layer anti-pattern: it assumes a 4-stage life-cycle, so makes this inflexible when it could be flexible. What about something more like:
So there is only one hook, its given what require always knows (name and parent), there are some library functions it can use to do its job (essentially the internal functions that are used by require now), but it can do something completely different if it wants to, as long as the end result is "here is a Module for that name". |
The problem with that approach is that you can't reasonably intercept the data between the stages. If, for example, I wanted to apply another transform to that code after the ts transpile, I'd have to monkey-patch the ts module itself along with the register function to ensure it only applies my extra transform when using ts through register handlers. |
@Qard Right, I see your point, and take back my suggestion. |
I definitely get where you're coming from on wanting a more composable interface though. Where the proposal is at now tries to balance enough power to cover the particular set of needs we know we have now without sacrificing too much in complexity and potentially performance burden. I'm definitely open to suggestions on how to improve the proposal. It is an EPS after all. |
At this point, I think we need more feedback from prospective users. |
Nice write-up with a good overview of the rational. @sam-github do you have suggestions on who specifically we should ask for feedback ? |
If I get time, I would like to review it. That is unlikely prior to June though :( |
@mhdawson people who are working in the target use-cases, to quote the proposal:
So, APM implementors (new relic, appdynamics, opsbeat, appmetrics, etc), someone from Istanbul (are there other coverage implementations?), mocking... no idea, babel/typescript/... |
AVA and (nominally) nyc/Istanbul maintainer here. Thank you @Qard for raising this proposal, we've ended up using packages like
Loader objects should make this easier, since it means there's only one list who's order has to be controlled. Ideally the |
One note about the API: In JS module land, it will also be valuable to control linking. That's the mechanism that would still allow "monkey-patching" (since source transforms have quite a few downsides re: access to dependencies). |
@jkrems you need to be careful here, as interrupting linking generally removes exports from being live. There is no notification when exports update and no getter like mechanism upon accessing an import. |
@bmeck Interesting! The whole "when is a binding live live" thing still trips me up. What I was thinking was something along the lines of: // original.js
export function f() {}
export function g() {} // monkey-patch.js
export { g } from '::magic'; // or whatever that syntax is
import { f as originalF } from '::magic';
export function f() {
// Do typical monkey stuff
return originalF.apply(this, arguments);
} I definitely have to do more experimentation/reading and it fails terribly if you don't have a full list of the original module's exports (and there currently is no API exposing that info). |
@jkrems also, your example is using a fn, which is safe-ish since it evaluates on each invocation, but things like: export let now = Date.now();
setInterval(() => now = Date.now()); where you can't execute code upon access are the real problems. Note: v8 has rejected implementing getters for variable access due to a number of reasons. |
- For the object-based style, it might also be a good idea to, rather than have | ||
the `transform` method, have `preCompile` and `postCompile` transformers. | ||
|
||
[1]: https://github.com/nodejs/node/blob/master/lib/_tls_wrap.js#L829 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably wanna future proof this link by tying it to a specific git ref (it's already broken) - tip: press y
while on the github page and it will update the URL for you with a ref
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I'll fix that soon.
### `require.resolvers` | ||
|
||
It is the responsibility of `require.resolvers` extension handlers to receive | ||
the in-process `Module` object and use the metadata on it to figure out how to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When referencing the Module
object, did you mean ModuleRequest
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I'll make some revisions soon. Pretty busy for the next few days! 😱
|
||
```js | ||
require.transformers.push(({ contents, resolvedPath }) => { | ||
if (!/\.js$/.test(resolvedPath)) return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't it return contents
? Or is returning undefined
the same as not modifying the contents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, returning undefined means no changes. Could work either way though.
How will these lifecycle events work with the module cache? Would you expect to do the module cache lookup after resolution as is done currently and if so would only the resolution event be emitted on cache hits? |
@matthewloring Personally I'd expect the cache to be populated with the output of the pipeline. So the cache would then continue to function just as today. I can't currently see a use-case where you'd like any of the hooks to fire if the module is being loaded from the cache... but I might have over looked something |
@watson The cache is currently keyed on the result of the resolution step so I think the user provided resolver would still be run before the cache lookup could occur. The use case that comes to mind for this would be module wrapping where you want the wrapper module to be given when the parent is not the wrapper module itself (which needs a handle to the module being wrapped). The cache does not currently take the parent into account to my knowledge which could interfere with this behavior. |
I made a few revisions. The @matthewloring That should be possible by looking at the |
Ok, that sound good. It does exclude applying different source transformations based on the parent module but I'm not sure how compelling that use case is. |
I do not think we should expose this API through the standard I think we should state that this new API will not have any slowdown in load time if it is not enabled. Node.js fast boot time is one of its main features, and we should keep that. Moreover, it should be fast even it is enabled, as there should be little overhead in passing the data. Given that ES6 modules are loaded at a different time compared to the standard |
There was no progress for a long while here and I do not see any conclusion. Is there any progress here? |
I'm planning on revisiting this soon and making some updates related to supporting es modules. :) |
Closing, but feel free to re-open or move to a more appropriate/higher-visibility repository. |
I'm still working on this, but I think it's time to get some more eyes on it.
This is the EPS continuation of my work in nodejs/node#12349 to attempt to enable an AST transformation pipeline in Node.js core. There's definitely some stuff missing still, and I'll add more over the weekend, but this is where I'm at today. 😸
@nodejs/diagnostics