Skip to content
This repository has been archived by the owner on Sep 2, 2023. It is now read-only.

Pluggable Loaders to support multiple use cases #82

Closed
MylesBorins opened this issue May 14, 2018 · 88 comments
Closed

Pluggable Loaders to support multiple use cases #82

MylesBorins opened this issue May 14, 2018 · 88 comments

Comments

@MylesBorins
Copy link
Contributor

As #81 is getting into implementation details of the loader I thought it might be a good idea to start discussing an idea I have been kicking around for a robust solution. "Pluggable Loaders". This is heavily inspired by nodejs/node#18392

TLDR; (bluesky)

  • Loader Plugins can be specified via an out of band mechanism (a.k.a. package.json or command line argument).
  • Loader Plugins can included within a project as an es module or installed via a package manager
  • Loader Plugins can specify the behavior of the loader including
    • behavior for specific extensions or mimetypes
    • behavior for lack of extensions (e.g. path searching)
  • Loader Plugins can cascade
    • plugins can override default behavior

Example loaders include

  • std/esm
  • TypeScript
  • wasm
  • Babel

Questions to be answered

  • Is this approach of interest?
  • Should plugins be specified per application or per package
  • What should node ship as default loaders?
    • Proposal
      • Package name map as default loader
      • "Legacy" loader shipped in core as well

Thoughts?

@devsnek
Copy link
Member

devsnek commented May 14, 2018

this is pretty much possible with the current system's hooks. using a combo of resolve and dynamic instantiate you can do everything from hot reloading to babel-style cjs named exports.

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented May 14, 2018

Like I wrote in #70, if developers need to still use Babel or esm or the like in order to achieve a common use case, then many of them will just keep using that tool as they do now and have it transpile everything, and it won’t matter that Node has added native module support. So asking developers to configure such a tool as a direct dependency of their apps doesn’t seem like much of a gain to me; it’s essentially where we are now.

I can see the benefit of maybe the package to be loaded having a flag in its package.json saying “use std/esm to load me” as its way of being compatible with both ESM and CommonJS, and therefore the app developer doesn’t need to install/load esm or Babel as a direct app dependency (as opposed to a dependency of that package). But that only works as long as all of an app’s dependencies have such a flag, and it will be many years before that happens unless we intervene somehow. Perhaps the flag could be added automatically for packages that lack it, but then we’re anointing a pseudo-official package loader which essentially needs to be supported as part of Node. Which might be okay! NPM is distributed with Node and more or less supported as part of Node core, so if esm say becomes the “official” loader dependency for CommonJS modules that don’t specify an alternative, that might be a workable solution.

@bmeck
Copy link
Member

bmeck commented May 16, 2018

@GeoffreyBooth I'm of the opinion that standardizing on ESM is of great value as it has the potential of sharing code without a build step between a variety of environments (not even just Node/ Browsers). If people are wishing to perform transformations that are not supported in all environments it makes some sense to me that it requires some configuration. I think continuing as we are with build tooling and required configuration should be seen as a red flag for your code working in all environments since some things like web browsers are not seeking to add such hooks at this time.

I firmly believe getting unification of hosting environments and developer ecosystems is of higher value than anything else that ESM can provide. We should seek to provide solutions to use cases, but some use cases are served in simple ways like how importing bindings by name may not be necessary if you can read properties of an object similar to how the default based import design works.

We can solve use cases in multiple ways, and that leads to the potential of multiple loaders which are tailored to specific use cases or environments in specific ways. I don't think shipping a specific loader is the best way to unify developers and actually encourages using tooling to continue to be relied on to assist in managing differences between environments.

@GeoffreyBooth
Copy link
Member

@bmeck The issue is, ESM doesn’t offer many benefits for Node users over CommonJS. There’s no reason the many thousands of Node developers will rush to use it. Some will, sure, but many others won’t see the need, just as many packages on NPM are still unapologetically using require statements and never switched to import with transpilation. We have 600,000 CommonJS modules on NPM, that will need to still support CommonJS environments for several years. For all practical purposes, every Node developer will need to import CommonJS packages for several years to come; and the more that using ESM is an incompatible breaking change, the less people will be able to start using it or be inclined to.

CommonJS is part of Node. It’s not just some other loader, like some userland thing that maybe Node tries not to break. If you want people to start using ESM, you need to provide them a way to use it in the world we live in now, where almost every dependency they import is a CommonJS module. And if that way is to push users off on userland solutions like Babel or esm, users will keep using those tools to transpile down to CommonJS the way they’re doing now. Adoption of true ESM will be even slower.

Look, I’m all for getting to the nirvana of an ESM-only future. But you need to provide a path for people to get there, because they’re not going to just throw out every app they’ve ever written and every library they’ve ever used to start using some new spec that has no obvious advantages for them.

@bmeck
Copy link
Member

bmeck commented May 16, 2018

@GeoffreyBooth I'm not suggesting they throw out applications they have written, and they can continue to use whatever compile to JS system they already do. I'm stating that the satisfaction of use cases might not match a specific compile to JS system and custom tailoring those loaders is probably going to continue. Node providing a compatible system with other environments is a higher priority to me than matching any given compile to JS system of today since people can continue to use those as they migrate. I don't think standardizing on a compile to JS system is a good idea. We have many tools using the ESM syntax for different semantics and need to admit that adopting and/or excluding loaders will have the same effect that you are seeking to avoid. You will break some amount of application, and also are breaking compatibility with other environments. I see unity in allowing loaders and creating a safe system for usability in other environments, not in encouraging semantics that enforce a specific behavior at odds with other environments.

@GeoffreyBooth
Copy link
Member

What’s the definition of “loader” here? Can a loader be something that converts import x from 'commonjs-lib' into const x = require('commonjs-lib') at runtime? And then can this loader be included by Node automatically when there are CommonJS dependencies in a package.json? Or if that’s a bridge too far, maybe when there’s something in package.json telling Node to load it, and NPM can put that configuration there when it knows it’s needed because of the dependencies?

Because that would be fine. That would solve the “import from CommonJS” use case.

@devsnek
Copy link
Member

devsnek commented May 16, 2018

that could be something that a loader could in theory do, however it causes me great stress that node transpiling code is somehow something that people would be okay with.

@bmeck
Copy link
Member

bmeck commented May 16, 2018

@GeoffreyBooth there are a ton of topics covered in that paragraph. Lets go with, it could be? it might not be? to all those questions. In general a loader is something that is given control of HostResolveImportedModule in some way. "Pluggable Loaders" as described above do something that isn't the default behavior. Having Node do something automatically would mean it is default so probably wouldn't be called a loader in the context of "Pluggable Loaders" as described at the top of this issue.

@GeoffreyBooth
Copy link
Member

@bmeck So work with me here. Describe to me a solution that you would find acceptable. My criteria for success are:

  • Code like import x from 'commonjs-lib' that people have been writing for years needs to work like it does now with Babel/esm.
  • Users shouldn’t need to install userland solutions like Babel or esm that transpile an entire app into CommonJS.
  • Whatever the interoperability solution is, it needs to be either part of Node proper or officially supported by Node, like how NPM is.

I’m not a fan of transpilation either, that’s just meant as a shorthand of explaining what I want the loader to achieve.

Let’s say we create a loader called commonjs-loaderthat can make import statements work for CommonJS dependencies. When a user does npm install, NPM can see that their app is in ESM mode (because of something in package.json) and also that they have CommonJS dependencies. Then NPM can throw a warning like:

This project has CommonJS dependencies. Install a loader to allow importing them:

  npm install commonjs-loader

I don’t see the point of NPM not just adding it automatically unless there’s some other loader already configured, but if that makes a difference, fine. The point is that it’s not Babel or esm—it’s not transpiling the entire app into CommonJS. It’s just enabling interoperability, while keeping Node in ESM mode (aside from the CommonJS interoperability).

It does seem odd that this wouldn’t be part of Node, though, as CommonJS is part of Node. This feels like something Node should solve or have a solution for built in, like it has core modules like fs and path.

@bmeck
Copy link
Member

bmeck commented May 16, 2018

  • Code like import x from 'commonjs-lib' that people have been writing for years needs to work like it does now with Babel/esm.
  • Users shouldn’t need to install userland solutions like Babel or esm that transpile an entire app into CommonJS.

These seem in conflict as you need to use tools to get their behavior if it has problems being converted to ESM. I would say to keep using tooling if you need the behavior of tooling and don't want ESM.

  • Whatever the interoperability solution is, it needs to be either part of Node proper or officially supported by Node, like how NPM is.

As with the first two points you are mandating the behavior of existing tooling, so keep using that tooling.

Given those 3 points the only solution is to completely adopt one of the compiler chains and doing it at runtime rather than implementing ESM. I don't think mandating the behavior of tools and then saying not to use the tools makes sense. The solution of just always using a runtime compiler similar to how Meteor does things satisfies your points but doesn't seem desirable to me.

You could change the specification to comply to some specific tooling, but I'm not going to go into that since this was about what can be done today I presume.


The point is that it’s not Babel or esm—it’s not transpiling the entire app into CommonJS. It’s just enabling interoperability, while keeping Node in ESM mode (aside from the CommonJS interoperability).

This is done through some level compilation/manual hooking even with loaders since it has to manipulate how code functions. I think there might be confusion on how loading code is affected by loaders. Loaders just instrument the various parts of ESM. Breaking ESM constraints requires manipulation of behaviors in some way and not using ESM and/or doing code transforms.

@GeoffreyBooth
Copy link
Member

@bmeck Is there a solution that you would find acceptable where import x from 'commonjs-module' works in ESM code? As opposed to a user transpiling that source into CommonJS before Node ever sees it, and then Node runs the CommonJS like it does now.

@bmeck
Copy link
Member

bmeck commented May 16, 2018

@GeoffreyBooth We can load the module namespace still, whatever that means. --experimental-modules works today and doesn't go outside of any specification behavior. The default behavior could expand in various ways for named imports either by altering the specification or preprocessing. Those behaviors for providing named exports are better described in #81

@benjamingr
Copy link
Member

benjamingr commented May 16, 2018

@bmeck @GeoffreyBooth

What if we ship a command line flag (like we do for esm) today that supports named exports of commonjs modules so users get the familiar user experience but we show a warning when it is used and ship Node with a tool that users can run on their project and converts ""cjs named imports"" to destructuring assignments?

That would let users starts with running their transpiled code natively as a start with a flag as well as give them an automatic tool to transition to more compliant ESM in the future. We can give babel and TypeScript users transforms that do this automatically too.

Named exports would work between ESM modules anyway so the UX on those isn't hurt.

@bmeck
Copy link
Member

bmeck commented May 16, 2018

@benjamingr the problem is the nature of "supports named exports" with that. If we can do it, it should be on by default and stay supported. It requires code transformation and I would be against it by default since it currently breaks how the specification requires things to act by doing behaviors like late binding which are not supported by the VM, or synthetic imports. Those breakages lead me to not want the idea to ever land if it is not on standards track for support by ESM.

Shipping a tool that does this transform ahead of runtime for you seems fine to me. The tool needs to not do destructuring assignment as that loses liveness and doesn't have a clear way to determine if a module namespace being imported is CJS or ESM though. The biggest problem is setting up the transformation without knowing if the import is CJS or ESM. You have to determine the mode of your dependencies in order to transform your module. That requires all dependencies be available and is not suitable for localized isolation in a package manager if that changes over time (people move to/from one format to another). So, it probably wouldn't work at the library level, but it probably would work as an application tool.

@benjamingr
Copy link
Member

that loses liveness and doesn't have a clear way to determine if a module namespace being imported is CJS or ESM though.

Can you elaborate on why a preprocessor tool would not know that?

@ljharb
Copy link
Member

ljharb commented May 16, 2018

@benjamingr not if it's import() from a URL.

@GeoffreyBooth
Copy link
Member

We can load the module namespace still, whatever that means.

So . . . what does that mean? 😄

It sounds like we have two options here regarding getting import x from 'commonjs-module' to work:

  1. Somehow find a way to make this happen within the context of the ESM spec, however this might be, even if it means going back to the committee and requesting spec changes. This sounds like a lot of what was discussed in Feature: Named exports when importing CJS #81.
  2. Use a loader or some other plugin to change Node’s behavior regarding CommonJS, even if that means the spec is broken, because it’s something explicitly added by the user and doesn’t mean that Node proper is violating spec.

Either or both options can be pursued. @bmeck you’re an expert on the spec, so I would encourage you to propose solutions for either option. Maybe the group doesn’t decide to pursue those solutions for one reason or another, or maybe you think they’re bad ideas, but if you were forced to come up with them, what would they be?

--experimental-modules doesn’t provide any interoperability with CommonJS, and that’s the problem I’m trying to solve.

@MylesBorins
Copy link
Contributor Author

@GeoffreyBooth I think we need to also accept that there is another possibility... we don't support transparent interoperability, and to use common-js one must use import.meta.require

@benjamingr
Copy link
Member

@ljharb

@benjamingr not if it's import() from a URL.

I have never considered that live bindings need to work with dynamic imports... somehow that makes them even more magical 😮

That said - wouldn't the overhead of wrapping it "in case" be negligible if it's only done for dynamic imports?

@bmeck
Copy link
Member

bmeck commented May 16, 2018

Can you elaborate on why a preprocessor tool would not know that?

Given a library mylibrary that does import 'foo';, it needs foo in order to determine what format the entry point is. When preprocessing like most libraries the output would be in isolation and not have the foo dependency pinned to a specific version. That means that mylibrary doesn't know what format foo is in if you preprocess it in one source tree but run it in another (like by downloading it off a package registry into a different source tree).

Applications don't have this problem as they generally have dependencies pinned / the entire source tree when they are run. They are a great time to run tools that require pinned versions and the source tree to not change.

@bmeck
Copy link
Member

bmeck commented May 16, 2018

@GeoffreyBooth

--experimental-modules doesn’t provide any interoperability with CommonJS, and that’s the problem I’m trying to solve.

What do you mean? You can import CJS with it.

// main.mjs
import dep from './dep.js';
console.log(dep.x);
// dep.js
module.exports = {x: 12345};

@benjamingr
Copy link
Member

@bmeck

That means that mylibrary doesn't know what format foo is in if you preprocess it in one source tree but run it in another (like by downloading it off a package registry into a different source tree).

Thanks - that explains things, but couldn't this be alleviated by requiring running the tool on npm install? (or even doing so automatically once the tool is run?)

@bmeck
Copy link
Member

bmeck commented May 16, 2018

@benjamingr even if you do it on install that kind of workflows don't work with times you run with symlinked dependencies, are manually updating things in your source tree, and/or don't use npm. I had similar talking about problems of doing things at install time only in package-community/discussions#2 (comment) which might help explain the situation here as well.

@GeoffreyBooth
Copy link
Member

I think we need to also accept that there is another possibility… we don’t support transparent interoperability, and to use common-js one must use import.meta.require

@MylesBorins That’s always an option, of course, but I’d prefer that be a last resort. That means users need to keep track of which of their dependencies are CommonJS and which are ESM, and refactor their code whenever a dependency switches from one format to another. When projects usually have dozens or hundreds of dependencies, that’s a burden; though I’m sure someone will write a tool to automatically convert import and require statements back and forth as necessary depending on the dependency. At the very least, I think such a tool should be built and released before Node’s module support is publicized, so that the release notes can say “use this tool to rewrite your code for you”.

But obviously it’s more user friendly if existing code works as is without needing refactoring, automatic or otherwise.

@MylesBorins
Copy link
Contributor Author

MylesBorins commented May 16, 2018 via email

@ljharb
Copy link
Member

ljharb commented May 31, 2018

You’d have to set up hot reloading prior to a module being imported, just like with CJS.

@bmeck
Copy link
Member

bmeck commented May 31, 2018

@naturalethic you can't completely invalidate ESM due to spec idempotentcy requirements. It is just as @ljharb said that you have to make a module that does delegation to the currently "hot" module instead of replacing module records.

Edit:

A crude example of a module that does delegation that only works for new calls to import() and does not work for static import / rebind existing values:

export function then (r) {
  r(import(await getHotModuleURL(...)));
}

@naturalethic
Copy link

@bmeck & @ljharb Thanks!

@TheLarkInn
Copy link

One thing I'd like to offer is leveraging an already vetted, tried, and completely extensible resolver that is async and leverages the default node algorithm oob.

https://github.com/webpack/enhanced-resolve

@bmeck
Copy link
Member

bmeck commented Jun 14, 2018

@TheLarkInn it would be good to support such a loader, but I think the discussion is more around how to enable essentially what the plugins field does in your example. enchanced-resolve is very complex and not something I'd like to take on from a support point of view. I think it is sane for us to have a more minimal API and allo people to call enhanced-resolve for their own purposes within any given loader.

@arcanis
Copy link

arcanis commented Sep 18, 2018

Most of the discussions here have been focused on the specific use case of using the loaders to transpile the code in some capacity. I have a quite different use case in mind, so please feel free to ask me to post this message in a separate thread if you think it'd keep discussions more manageable.

We recently unveiled Yarn Plug'n'Play, whose goal is to generate static maps that Node would then be able to consume. Since Yarn knows everything about the dependency tree, including the location of the packages on the disk, it makes sense to fetch information directly from it - and it makes it possible to avoid creating the node_modules folders, amongst other benefits (whitepaper here).

In order to achieve this, we currently require our users to use the --require option to inject the Yarn resolver into Node. To make this a bit easier and more environment-agnostic we've also introduced yarn node that simply wraps Node and automatically injects this flag if needed. That's not great for a variety of reasons (the main one being that we don't want to substitute to Node), and as such we'd really like to find a way to tell Node that a loader must be used in order for a project to work. Using a per-project settings rather than a per-process one, so that users wouldn't have to change their workflows one bit.

All this to say: the loaders field described by Myles in #82#389761269 would be extremely useful in this regard. We could simply add the Plug'n'Play hook at the beginning of the array, and everything else would Just Work™. While important, transpilers aren't the only kind of project that would benefit from this API.

@bmeck
Copy link
Member

bmeck commented Sep 18, 2018

@arcanis would nodejs/node#18233 be sufficient? We are currently unable to land it if we were to PR it, but just check the design for now and see if anything is missing. With resolve(module, specifier) in place you could have a static mapping still.

@guybedford
Copy link
Contributor

guybedford commented Sep 18, 2018

@arcanis the difficulty with this model is that it becomes very tricky to distinguish between a loader that is necessary for a package to work and a loader that is necessary for a specific project to work. For example, I might in development use Yarn Plugn'n'Play, then publish that package to npm, with the same plugin and play loader declaration in the package.json. Now anyone installing this package would get the Yarn loader applied, even if they were consuming the package via node_modules with npm.

So this is the current snag here on loaders, making this distinction clear and simple. Suggestions very welcome here!

@arcanis
Copy link

arcanis commented Sep 18, 2018

@bmeck I don't think it would be enough unfortunately, because of the distinction @guybedford mentioned - your PR adds support for per-package hooks, but in the case of PnP the hook must be registered globally (since all packages will need to use it in order to resolve their own dependencies).

The use case would be covered by the "Global Composition" section, except that it would be a production loader, not development only. Using the environment to set the global hooks is nice for debugging, more impractical for production (it doesn't remove the need for yarn node, since the environment has to he set in some way; it also poisons child processes environments).

@guybedford What about a globalLoaders option that Node would use from the closest package.json in the filesystem hierarchy (relative to the script being executed, or the cwd in the case of -e / -p), and would ignore after the program has started (using the loader field from @bmeck's PR instead)? 🤔

@bmeck
Copy link
Member

bmeck commented Sep 18, 2018

@arcanis I would be against adding non-package (application level) data to package.json

What is exactly the reason that this cannot be done on a per package level?

NODE_OPTIONS="--loader=@yarn/pnp" is not sufficient I take it then as well? Couldn't you turn off the propagation to children by removing the part of process.env.NODE_OPTIONS that is setting --loader?

@guybedford
Copy link
Contributor

globalLoaders sounds like a very useful concept to me. And it would also directly benefit the way jspm wants to apply loaders as well.

@devsnek
Copy link
Member

devsnek commented Sep 18, 2018

if your individual package depends on yarn pnp it should still be package level. i don't want my deps being handled by some other package randomly.

@arcanis
Copy link

arcanis commented Sep 18, 2018

@bmeck Some reasons why I think NODE_OPTIONS might not be the right solution:

  • Setting NODE_OPTIONS is an explicit action. The user has to either set it themselves, or let a tool do it for them.

  • In the first case, they have to either set it system-wide (which is not acceptable, since some projects might require the hook and others might not), or temporarily either in their shell session (meaning they have to remember to remove it later) or the single command line (which makes the command lines much more verbose and hard to write/grasp).

  • In the second case, they have to setup something that will create the NODE_OPTIONS as needed. This something may be a shell alias, or yarn node, but whatever it is it will put itself between Node and the user, adding an extra layer of complexity (for example, yarn node requires at least one extra process to spawn, including a full Node context that would run the yarn binary simply for eventually shelling out to the real Node - and it can be worse depending on the setup). It also doesn't compose very well with other loaders (will each global loader have its own wrapper).

  • It will poison the environment for child processes. While it could be somewhat fixed by removing the part of NODE_OPTIONS that sets --loader, as you mentioned, it's not a working solution, because there's no way to know what added this flag in the first place. The intent is completely lost: is the loader meant to been preserved? or removed? (as a side note, it also requires to parse the NODE_OPTIONS string, which comes with its own problems)

  • What happens if you use fork to spawn a new Node process in a directory installed through Plug'n'Play? If Node doesn't have an integration to find out by itself the global loaders that need to be loaded, it means that the code doing the fork will have to implement it. Will Plug'n'Play (or esm, or babel/register) have to monkey-patch fork to support the use case? Same thing for execSync.

if your individual package depends on yarn pnp it should still be package level. i don't want my deps being handled by some other package randomly.

Plug'n'Play is enabled on an application-basis, not package-basis. You cannot have an individual package depend on PnP (hence why globalLoaders would be ignored for anything else than the running script).

@bmeck
Copy link
Member

bmeck commented Sep 18, 2018

@arcanis all of those are reasons why I believe that it should be done on a per package level. You state that it is enabled on an application-basis, but why can it not be enabled on a per package basis? Most packages when you install them don't come with node_modules populated and loaders let you actively opt-out of node_modules

@zenparsing
Copy link

@arcanis For your use case, are there multiple entry points into the application that need the same loader-hook treatment, or is there generally only one entry point?

@arcanis
Copy link

arcanis commented Sep 18, 2018

You state that it is enabled on an application-basis, but why can it not be enabled on a per package basis?

@bmeck There are a few reasons:

  • Something that might not be clear from my previous comments is that Plug'n'Play causes a single Javascript file called .pnp.js to be generated where would typically be found the node_modules folder (which isn't created at all). This file is the loader that needs to be injected. This is the one and only source of truth to know where are located the packages on the disk - it contains everything needed for Node to say "package X is requiring file Y - then it needs to be loaded at location /path/to/Y" - and this anywhere on the dependency tree.

  • Packages (most of them downloaded directly from the npm registry) have no idea whether they're used under PnP or regular installs. Nor should they have to: it wouldn't be feasible to ask for all package authors to add the PnP loader to their package.json - and it would break for people not using PnP. No, the packages should be generic, and it's to the loader to work whatever is the package configuration.

  • Even if they were aware of whether they run in a PnP or non-PnP environment, the loader path cannot be known on their side (and writing it inside their respective package.json files at install-time isn't feasible, since the same package folder will be used by multiple .pnp.js for multiple projects).

In case the reason why per-package configuration wouldn't work, I recommend taking a look at the whitepaper - I think it might help clarify some points that can still be confusing, especially the Detailed Design section.

For your use case, are there multiple entry points into the application that need the same loader-hook treatment, or is there generally only one entry point?

@zenparsing It's up to the user, they can have as many as they want, and it cannot be statically determined since they can new ones after the install. Basically, each time they would usually run a script using node ./my-script.js, they'll need to register the loader present in this folder.

@bmeck
Copy link
Member

bmeck commented Sep 18, 2018

@arcanis I did see the design, but I'm still unclear on why it needs to be that way. Maybe we should setup a call. I'm not sure but it seems like there is some confusion or I'm missing something. I agree that the current whitepaper/RFC would not be an ideal fit for either global or per package loaders. I'm interested in solving the use case, but if we must perfectly match that RFC it seems unlikely to be pleasant.

Something that might not be clear from my previous comments is that Plug'n'Play causes a single Javascript file called .pnp.js to be generated where would typically be found the node_modules folder (which isn't created at all). This file is the loader that needs to be injected. This is the one and only source of truth to know where are located the packages on the disk - it contains everything needed for Node to say "package X is requiring file Y - then it needs to be loaded at location /path/to/Y" - and this anywhere on the dependency tree.

So can't a Loader just find that file/generate it as needed? A package manager could even make a big single shared .pnp.js that works across all packages. I don't see how this really relates to needing it to be a application level option. I must be missing something.

Packages (most of them downloaded directly from the npm registry) have no idea whether they're used under PnP or regular installs. Nor should they have to: it wouldn't be feasible to ask for all package authors to add the PnP loader to their package.json - and it would break for people not using PnP. No, the packages should be generic, and it's to the loader to work whatever is the package configuration.

That sounds like you want a global hook, but as described in both the comments and a few notes in the RFC this would be a breaking change to use this resolution algorithm. Wouldn't that mean that users should opt into this behavior? And if they should opt into this behavior how would they do so to state they are using non-standard behavior if not on a per package level.

Even if they were aware of whether they run in a PnP or non-PnP environment, the loader path cannot be known on their side (and writing it inside their respective package.json files at install-time isn't feasible, since the same package folder will be used by multiple .pnp.js for multiple projects).

I don't understand this comment, why can't you have either linkage via symlinks like pnpm or use the default resolution algorithm to get to your loader?

@arcanis
Copy link

arcanis commented Sep 18, 2018

if we must perfectly match that RFC it seems unlikely to be pleasant.

My goal in being here is to help make this pleasant to everyone. If the RFC has to consider new facts, so be it. I want to stress that we're really open to feedback and don't want to push this on anyone 🙂

So can't a Loader just find that file/generate it as needed? A package manager could even make a big single shared .pnp.js that works across all packages.

I'm not entirely sure what you mean - a loader cannot generate that file, since it is meant to be generated by the package manager. Yarn already does make a big single single shared .pnp.js file that works across all packages, so I'm not sure either I understand correctly.

If you mean "accross all projects", this isn't possible - multiple projects have different resolutions (different lockfiles, in sort) that cannot be merged, and the .pnp.js is meant to be checked-in anyway.

this would be a breaking change to use this resolution algorithm. Wouldn't that mean that users should opt into this behavior? And if they should opt into this behavior how would they do so to state they are using non-standard behavior if not on a per package level.

There's a few points here that can be discussed (maybe it'd be better to mention it on the rfc thread, since people here might not be interested about Plug'n'Play's details?):

  • First, something to note is that Plug'n'Play aims to be compatible with the existing ecosystem.
  • Which means that whether Plug'n'Play is enabled or not should not affect the packages.
  • Which in turn means that packages shouldn't have to be aware of whether PnP is enabled or not.

How do we achieve this compatibility? By strictly following the rules of dependencies / devDependencies. It covers almost all needs. The only want it doesn't cover is obviously packages directly accessing the node_modules folder, but the fix is usually quite simple, thanks to require.resolve. Anyway, the main point is: operating under PnP or not is a setting that is done at the installation level, not the package level.

I don't understand this comment, why can't you have either linkage via symlinks like pnpm or use the default resolution algorithm to get to your loader?

Neither symlinks nor the node resolution would solve the problem. Consider the following hierarchy:

/path/to/cache/lodash-1.2.3/package.json -> {"loader": "???", "dependencies": {"left-pad": "*"}}
/path/to/cache/left-pad-1.0/package.json -> {"loader": "???"}
/path/to/cache/left-pad-2.0/package.json -> {"loader": "???"}

/path/to/project-1/my-script.js -> require(`lodash`)
/path/to/project-1/.pnp.js -> lodash=lodash-1.2.3, left-pad=left-pad-1.0

/path/to/project-2/my-script.js -> require(`lodash`)
/path/to/project-2/.pnp.js -> lodash=lodash-1.2.3, left-pad=left-pad-2.0

What would you put in loader that would simultaneously target both project-1/.pnp.js and project-2/.pnp.js, depending on the environment? Note that one of the goal of Plug'n'Play is to avoid I/O, meaning that creating symlinks in project-1 and project-2 isn't allowed (and it would require --preserve-symlinks anyway).

@bmeck
Copy link
Member

bmeck commented Sep 19, 2018

@arcanis I'm getting quite confused with a lot of implementation of how your system works right now being thrown around. I'm going to just state how I would expect things to look given what I'm understanding:

/path/to/cache/[email protected]/package.json -> {"loader":"./.pnp.js"}
/path/to/cache/[email protected]/package.json -> {"loader":"./.pnp.js"}
/path/to/cache/[email protected]/package.json -> {"loader":"./.pnp.js"}
/path/to/cache/[email protected]/package.json -> {"loader":"./.pnp.js", "dependencies": {"foo":"2.0.0"}}
/path/to/cache/[email protected]/package.json -> {"loader":"./.pnp.js", "dependencies": {"foo":"^2"}}

/path/to/project-1/app.js -> require(`foo`) require(`bar`)
/path/to/project-1/package.json -> {"loader": "./.pnp.js"}
# unclear how this handles the `foo@2` nesting?
/path/to/project-1/.pnp.js -> [email protected], [email protected]

/path/to/project-2/app.js -> require(`foo`) require(`bar`)
/path/to/project-2/package.json -> {"loader": "./.pnp.js"}
# no nesting, simple
/path/to/project-2/.pnp.js => [email protected], [email protected]

/path/to/project-3/app.js -> require(`foo`) require(`bar`)
/path/to/project-3/package.json -> {"loader": "./.pnp.js"}
# no nesting if bar using [email protected]
# nesting if bar using [email protected]
/path/to/project-3/.pnp.js => [email protected], [email protected]

Given that .pnp.js can intercept all incoming requests for dependencies within a "project", we can ensure that it loads to a location that properly does the requirements in the RFC.

  1. receive a resolve(moduleId (generally a URL), specifier) request.
  2. create body of /path/to/cache/... and assigned id.
  3. map assigned id to /path/to/cache/... resolutions so that resolve(...) can handle it.
  4. respond with body

This could be done in a ton of other ways, the pnpm style symlinkscould be used, and a loader would prevent the need for using--preserve-symlinks` if it handled that itself.

We also face some problems from taint if we use methods of finding /path/to/cache/... like os.homedir() if someone is using things like child_process.fork, cluster, etc. if the flags are removed / if they set a different user id. Having /path/to/cache be local and resolvable without outside data would be ideal. This is part of why pnpm style symlinks are attractive. These problems can also be problematic if we use CWD and try to run /path/to/project-1/ in a different working directory from /path/to/project-1/ such as using a CWD of /tmp/pid1000.


We also face some problems of creating a distinction of application and package here. If I were to require('/path/to/project-1') and it acted differently from node /path/to/project-1 we face some problems regarding how to determine if something is running as an application or as a package. Using module.parent style detectiong is insufficient since it could be loaded in odd ways:

$ # using the repl
$ node
> require('/path/to/project-1')
$ # as a preload
$ node -r /path/to/project-1 sidecar
$ # bad APIs
$ node -e '...;require("module").runMain(...)'

I have serious concerns in creating systems that create this divergence since it makes behavior very dependent on how things are used, which we don't currently have well defined paths for a variety of things in ESM. Things like module.parent don't even make sense in ESM.

It seems like if you really need that distinction we should clarify what an application is far beyond having it mean that something is running via a specific CLI command. I feel like such a distinction might need to ensure that an application cannot be used as a package and vice versa.

@arcanis
Copy link

arcanis commented Sep 19, 2018

I must be missing something: how come the loader fields in the cache packages reference ./.pnp.js, but there is no .pnp.js file in the cache packages? Wouldn't those relative paths resolve to /path/to/cache/[email protected]/.pnp.js, which doesn't exist (and cannot exist since it contains project-specific data that cannot be stored in the cache)?

@bmeck
Copy link
Member

bmeck commented Sep 19, 2018

@arcanis i would assume they exist in the cache as well, i am still unclear on what is project specific hence wanting a meeting.

@arcanis
Copy link

arcanis commented Sep 20, 2018

Some highlights from my understanding of what we discussed (@bmeck, @MylesBorins, please correct me if I'm mistaken somewhere):

  • The resolution process should be split in two steps:

    • First a pass would resolve the bare imports into a value,
    • Then during the second pass the loader would take this value, finish resolving it if needed, and finally load a module using it.
    • Taking a JSX loader as example: react-calendar would resolve to the unqualified resolution /path/to/cache/react-calendar@1 during the first step, then the second pass would kick in and would internally turn this path into a fully qualified resolution, turning it into /path/to/cache/react-calendar@1/lib/calendar.jsx, which it would finally load.
  • Since the PnP has a global knowledge of all packages of the dependency tree, it needs to affect all packages being loaded

    • Still, this is complex to achieve by pure application-level loaders, because it's not clear what is an application.
    • This can be achieved by a package-level loader that would wrap the loaders from all imported modules.
    • Through this pattern a loader would effectively act as an application-level loader, but would require some boilerplate.
  • The value returned during the first pass might not be an actual filesystem path

    • An Asset API (similar in concept to FS, but read-only) would then allow consumers (both userland code and loaders) to use these values to read the files they point to

A pseudo-implementation for what such a PnP loader would look like this (@bmeck, let me know if I made a conceptual error here). Note that I made PnP more complex than it actually is (it currently returns actual folder paths, not hashes) to illustrate in the next snippet the use of the asset api.

import {resolveToUnqualified} from './.pnp.js';

export default function pnpLoader(parentLoader) {
  return {
    // request=react-calendar/component
    loaderStepOne: (request, issuer) => {
      // return=TZhsV3bGQV2KZIjFIObr/component
      return resolveToUnqualified(request, issuer);
    },
    // request=TZhsV3bGQV2KZIjFIObr/component
    loaderStepTwo: request => {
      const {loader, ... rest} = parentLoader.loaderStepTwo(request);
      // Wrap the loader to substitute it by our own
      return {loader: pnpLoader(loader), ... rest};
    }
  };
}

And a pseudo-implementation for the default loader would be something like this (keep in mind this is SUPER pseudo-code, we haven't discussed code samples and this is just based out of my rough understanding of how the asset api could work):

import * as fs from 'fs';

export default function defaultNodeLoader(parentLoader) {
  return {
    // Note that the PnP loader would entirely shortcut this step, since
    // it implements its own step one.
    loaderStepOne: (request, issuer) => {
      // We need to run the full resolution since the extension and index.js
      // must be resolved in order to select the right node_modules
      return runNodeResolution(request, issuer, fs);
    },
    //
    loaderStepTwo: request => {
      // If there's a parent resolver, use it to resolve the assets (since it's
      // the only one that'll know how to use the special identifier
      // TZhsV3bGQV2KZIjFIObr/component that's been returned by
      // the PnP loader)
      const selectedFs = parentLoader ? parentLoader.assets : fs;
      
      // Then use the FS we've selected to run the resolution; we need to run
      // it again (even if we do it in the step one of this loader), because the
      // step one is not guaranteed to return a fully qualified path (the PnP
      // override wouldn't, for example)
      const qualifiedPath = runNodeResolution(request, issuer, selectedFs);
      // without PnP = /.../node_modules/react-calendar/component/index.js
      // with PnP    = TZhsV3bGQV2KZIjFIObr/component/index.js

      // And then, once it has finished resolving the fully qualified path, it
      // can load it (still using the parent loader FS if needed)
      const body = await selectedFs.readFile(qualifiedPath);
      const loader = /*not sure how it will be loaded?*/;

      return {body, loader};
    }
  };
}

@MylesBorins
Copy link
Contributor Author

Closing this as there has been no movement in a while, please feel free to re-open or ask me to do so if you are unable to.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests