Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yarn Plug'n'Play: Getting rid of node_modules #101

Closed
wants to merge 4 commits into from

Conversation

arcanis
Copy link
Member

@arcanis arcanis commented Sep 13, 2018

This PR is locked to prevent sending notifications to people who are only interested in the high-level design. For any technical issue, please direct your questions to the Yarn repository.

Note: This PR's discussions should be focused on the high-level design. A separate PR has been opened on the code repository to discuss the implementation details.

A pdf has been generated and made available here for easier reading. Note that this PR remains the reference location for the most up-to-date information regarding this proposal.

Hi folks!

We propose in this RFC a new alternative and entirely optional way to resolve dependencies installed on the disk, in order to solve issues caused by the incomplete knowledge Node has regarding the dependency tree. We also detail the actual implementation we went with, describing the rational behind the design choice we made.

I'll keep it short here since there's much to discuss in the document itself, but here are some highlights:

  • Installs ran using Plug'n'Play are up to 70% faster than regular ones (sample app)
  • Starting from this PR, Yarn will now be on the path to make yarn install a no-op on CI
  • Yarn will now be able to tell you precisely when you forgot to list packages in your dependencies
  • Your applications will boot faster through a hybrid approach of static resolutions

This is but a high-level description of some of the benefits unlocked by Plug'n'Play, I encourage you to give a look at the document for more information about the specific design choices - and in case anything is missing, please ask and I'll do my best to explain them more in depth!

I should mention that we've been using in production inside Facebook for about two weeks now, and didn't get issues since then. Now that it passed the trial by fire we felt confident enough that this solution was the right solution, and share it openly so that we can all iterate on it.

Working on this project has been super exciting for me, and I can't wait to see the new possibilities that it will unlock! Especially from a tooling perspective, the benefits of having a unified indirection allowing package managers to dictate the way the dependency are loaded unlocks new incredible patterns and make it easier and safer for tools to integrate with it.

Paging some community members that have been made aware of the project during its development and helped us in various ways, either through actual contributions (kudos to @imsnif for implementing yarn unplug!) or by their feedback:

@orta
Copy link

orta commented Sep 13, 2018

I read the paper. I love the concept, moving the resolver into an exportable the pnp.js is a particularly smart idea.

@davidnagli
Copy link

davidnagli commented Sep 13, 2018

Really love this idea! Would this have to be implemented on a package manager level, bundler, or both?

@arcanis
Copy link
Member Author

arcanis commented Sep 13, 2018

@davidnagli Mostly package managers. Bundlers could however be able to consume the .pnp.js file to do special processing and provide a better integration with the dependency tree (such as workspaces).

@azz
Copy link

azz commented Sep 13, 2018

⚰️node_modules

This is fantastic!

A few random thoughts:

  • This will probably break a lot of tools and libraries that rely node_modules existing, like read-pkg-up (7m weekly downloads). A migration strategy would need to be documented.

  • How will bin commands be handled? Currently they're in node_modules/.bin which is added to the $PATH. Will that still exist or be moved to the cache? (I guess PATH=$PATH:$(yarn bin) could help in that case.)

  • Has this approach been tested with other tools that modify node resolution, like esm?

    # will this work?
    node -r esm -r .pnp.js index.js 
  • In order to solve this, Plug'n'Play details a special case when a package makes a require call to a package it doesn't own but that the top-level has listed as one of its dependencies.

    This would have to include peerDependencies, right?

  • I might be missing something, but why have relative paths from the workspace root to the yarn cache? Could the path to the yarn cache be added to the environment and then the paths look more like $YARN_CACHE/lodash/4.0.0/flatMap.js?

@zkochan
Copy link

zkochan commented Sep 13, 2018

For reference, I always wanted to symlink packages directly from the store in pnpm (related issue: pnpm/pnpm#1001).

We also did a partial implementation (using the --independent-leaves flag). However, too many packages in the ecosystem currently rely on their real location, so we decided to make the change later. But it would be a lot faster than the current algos used by npm/yarn/pnpm.

Lots of packages already don't work with pnpm because of its strict node_modules. This is one of the reasons pnpm isn't adopted as much as Yarn or npm. I see in the RFC that it is planned to make this hooked resolution algorithm "strict" as well. I think it would be good for pnpm as we don't have the power to make the ecosystem fix itself. Even fixes that we contribute are sometimes not merged and published for years.

So this is a very brave design. I wonder how it will be welcomed when all the issues will arise. I wonder whether the design will be adjusted or you will insist to keep the strict resolution (this scenario would be best for pnpm)


Since Plug'n'Play flattens the dependency tree while still preserving the links between the nodes, the paths Node will get will be the same for any `[email protected]` inside the dependency tree, causing the package to be instantiated a single time.

### A. users cannot require dependencies that aren't listed in their dependencies
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A. => E.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tfw you reorder sections and forget to update the identifiers


In order to solve this, Plug'n'Play details a special case when a package makes a require call to a package it doesn't own but that the top-level has listed as one of its dependencies. In such a case, the require call will succeed, and the path that will be returned will be the exact same one as the one that would be obtained if the top-level package was making the call.

### e. edit-in-place workflows need different tools

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e. => E.

/ping @such

@mischnic
Copy link

Strange that npm has just released a similar concept (as package manager called crux): https://blog.npmjs.org/post/178027064160/next-generation-package-management

@schmod
Copy link

schmod commented Sep 13, 2018

My $0.02 is that any changes to replace the require.resolve algorithm should take place in the Node.JS core itself, and be guided by Node's community process instead of the (single) commercial entities behind Yarn and NPM.

An awful lot of existing code and infrastructure depends on the current behavior of require.resolve, and will break if those semantics are changed. The changes being discussed have the potential to fork or break the Node/NPM ecosystem, and should not be taken lightly.

I don't want to have to worry about whether my dependencies are compatible with node_modules, pnp, or crux.

If a future version of node allows package managers to supply an alternative implementation of require.resolve, that's fine, but there should explicit guidelines about how that should work, and how developers should write package-manager-agnostic code. This would be a semver-MAJOR change to Node.

@Droogans
Copy link

How is this going to change yarn checksum behaviors, if at all? I noticed that it wasn't even mentioned.

@arcanis
Copy link
Member Author

arcanis commented Sep 13, 2018

@schmod Note that the goal here isn't to sidestep the Node processes. Rather, we want to show a practical implementation that we know is solid enough for a first iteration, and use it to ignite discussions.

If you want to compare it to something else, see it as what Boost sometimes is compared to the standard C++ library: it's only after boost::thread and boost::filesystem were proved to work that the WG21 felt confident enough to use them as base for the standard library, knowing that the design was of good quality not only in theory, but also in practice.

@donaldpipowitch
Copy link

I'm always interested in changes like this one. 👏

I guess this would break TypeScripts @types resolving logic which expects these modules inside node_modules. What do you think about this change @DanielRosenwasser? Would you support that in TypeScript?

@boblauer
Copy link

An important note from the crux readme:

You can still install things in your node_modules folder and those versions will be used in preference to the cached version. This opens a path to live-editing of dependencies (sometimes a necessary debugging technique) ...

Will this be possible with pnp as well? I poke around in node_modules very often to debug and better understand the libraries I'm using.


### C. users working on multiple projects across a system won't pay increasing install costs

A common occurence in the Javascript world is developers working on multiple disconnected projects sharing similar dependencies (for example all projects created through `create-react-app`). Due to how package managers currently work, the files used by those projects were typically copied from the cache into multiple `node_modules`, multiplying both the total size of the installs on the disk and the time wasted running installs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: occurence -> occurrence


The reason this is a problem is that both of those actions don't have the same meaning and as such interfere with each other. The symlinks used for the virtual packages implementation referenced in Section 3 are a direct consequence of this: while it would be possible to implement this concept by making `require.resolve` create and return special in-memory identifiers that `require` would be able to understand, it wouldn't be possible to use those identifiers as paths (unless we were to patch the `fs` module, which is totally unacceptable).

A fix would be to split require.resole in two:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: resole -> resolve

* An exception is made if the package being required is listed in the dependency detail of the top-level. In this case, the package making the request will obtain the exact same **instance** than if the top-level package had made the require call (note the emphasis on instance rather than version).
* We however discourage packages from relying on this exception, since it's only been implemented to lower the adoption cost and help plugin systems. Packages should prefer using `peerDependencies` if applicable.
* Two packages depending on the same reference of the same dependency that itself has a transitive peer dependency **MUST** get the exact same instance of this dependency, whatever their locations in the dependency tree are.
* Two packages depending on the same reference of the same dependency that itself has a transitive peer dependency **MUST** get different instances of this dependency.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule has the same condition as the previous one, but have different results. Is it a mistook?

Copy link
Member Author

@arcanis arcanis Sep 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the first one should be:

Two packages depending on the same reference of the same dependency that doesn't have any transitive peer dependencies MUST get the exact same instance of this dependency, whatever their locations in the dependency tree are.

In short, if two packages resolve to lodash 1.0.0, they are guaranteed to share the exact same instance (because lodash doesn't have any peer dependency).

If two packages depend on react-dom 1.0.0, they are guaranteed to each have their own instance (because react-dom has a peer dependency on react).

@arcanis
Copy link
Member Author

arcanis commented Sep 13, 2018

Will this be possible with pnp as well? I poke around in node_modules very often to debug and better understand the libraries I'm using.

yarn unplug <pkg-name> will put a copy of the specified package into .pnp/unplugged. You can then inspect/edit this package as you see fit, and once you're done you just have to yarn unplug --clear-all to go back to normal 😃

@deepsweet
Copy link

The current implementation overrides Module._load, but Node 10 recently released a new API that we plan to use to register into the resolver.

Could you please provide a link to that new API?

@arcanis
Copy link
Member Author

arcanis commented Sep 13, 2018

Cf loader hooks. This API only applies on imports coming from ESM modules, unfortunately.

@ghost
Copy link

ghost commented Sep 13, 2018

How does this work with modules that wrap native code? So is the idea that no more native modules will be allowed? The example given is node-sass but there are plenty of other modules that require post install logic.

@Aghassi
Copy link

Aghassi commented Sep 13, 2018

Noticed the bit about postinstall going away. Agree the premise for security. I'm curious how that would impact tools like https://github.com/typicode/husky that help developer workflow, but rely on a post install task to setup. Is there a story around how these types of tools would be able to exist in this new world?

@transitive-bullshit
Copy link

transitive-bullshit commented Sep 13, 2018

Really interesting proposal -- Glad to see that the yarn team is thinking radically towards the future!

One issue with relying on unplugged is that my 99% of use case for yarn link is to link to a version of a dependency that I'm developing locally in parallel, and maintaining that symlink is important so I can edit the "real" checkout of a dependency instead of a temporary throwaway version in the .pnp/unplugged folder.

Aside from this issue, I'd love to hear the yarn team's thoughts on @azz's questions. I think @schmod's excellent point is answered pretty well by @arcanis's analogy to boost, but it would be great to understand what the Node and NPM folks think early on in this process.

@zkat I'd really love to hear your thoughts on how this proposal differs from and is analogous to some of the design decisions npm has been making with crux.

Thanks && I love the Node community because of awesome developments like this!

@rivertam
Copy link

Also on the post-install but less agreeable.

While native modules have their usefulness, WebAssembly is becoming a more and more serious candidate for a portable bytecode as the months pass.

This impacts a whole class of packages which genuinely rely on actually using some other language. For example, I've built a binding for a camera that interacts with linux4video2. While the camera has official bindings for both Node and C++, it doesn't have bindings for wasm. For this project, we genuinely need the low level capabilities, speed, and parallelism that C++ provides, but we can't use wasm because none of the APIs are available for wasm.

The solutions proposed in this proposal don't impact this in any way, as far as I can tell, and we'll be able to use pnp across all our projects. However, I'm just a little nervous reading that sentence.

@saschagrunert
Copy link

saschagrunert commented Sep 13, 2018

Very smart and clean concept about the next generation dependency handling with yarn. Great! 👍

@arcanis
Copy link
Member Author

arcanis commented Sep 13, 2018

The solutions proposed in this proposal don't impact this in any way, as far as I can tell, and we'll be able to use pnp across all our projects. However, I'm just a little nervous reading that sentence.

@rivertam Just to be clear: postinstall scripts will be supported, there is no plan to deprecate them in this RFC. I guess I should have been clearer 😃

One issue with relying on unplugged is that my 99% of use case for yarn link is to link to a version of a dependency that I'm developing locally in parallel, and maintaining that symlink is important so I can edit the "real" checkout of a dependency instead of a temporary throwaway version in the .pnp/unplugged folder.

@transitive-bullshit I have some research to do on this side - I believe yarn link can be made compatible with PnP without problem - we just have to do the same thing than unplug except that we would generate a symlink instead of a real folder. The main problem will be to reconcile both dependency tree, but I believe it can be done without too much issue 🙂

This will probably break a lot of tools and libraries that rely node_modules existing, like read-pkg-up (7m weekly downloads). A migration strategy would need to be documented.

@azz Yep, totally. For most packages, using require.resolve is enough, but some other might require a bit more tooling. From my experiments, it wasn't super common, and could be fixed without too much efforts. Since projects like Webpack and Babel happen to work just fine, I'm confident we should be mostly fine 👍

How will bin commands be handled? Currently they're in node_modules/.bin which is added to the $PATH. Will that still exist or be moved to the cache? (I guess PATH=$PATH:$(yarn bin) could help in that case.)

@azz The binaries are "hidden in the cache", but you can access them through yarn run <bin-name> just like before, which does the heavy lifting of figuring out which tools are provided by your dependencies.

This would have to include peerDependencies, right?

@azz Not entirely sure what you mean, but this "fallback to the top-level" doesn't consider whether things are a dependency or not. If a package makes a require call to something it doesn't own (either through dependencies or peer dependencies, which are resolved at install-time).

I might be missing something, but why have relative paths from the workspace root to the yarn cache?

@azz They were initially absolute, but I made them relative to the location of the .pnp.js file (which is at the root of the project). It could be relative to an environment variable, even if I'm not sure it's something we really want to encourage. Opinions welcome!

@CrabDude
Copy link
Contributor

CrabDude commented Sep 18, 2018

FWIW, the post-install cache conflict raised in the pdf would be ameliorated by including the NODE_MODULE_VERSION in a cache entry's hash if

  1. A post-install script ran
  2. The resulting physical (filesystem) entries differ. (i.e., file content/existence)

This could be repeated for known meta-values of significance (e.g., OS version for fsevents). In the future, packages could even pre-declare meta-values to be considered for cache heuristics.

EDIT: This would be consistent with the accepted caching RFC Idempotent Install still to be implemented.

@DanielRosenwasser
Copy link

DanielRosenwasser commented Sep 18, 2018

Hey @arcanis, thanks for putting this together. I'd like to try to discuss the impact this might have on the TypeScript community. While I'll be traveling for the next few weeks, maybe we can schedule a call or something in the near future.

From my personal view, the high-level goals sound like a great idea. Apart from the use-cases given here, when it comes to powering tools like TypeScript (including editor scenarios for both TypeScript and JavaScript in tools like VS and VS Code), resolution tends to be a pretty costly process that can cause delays. Minimizing that sounds great!

However, there are a few issues I'd like to raise.

Alternative Resolution

You may already know, but TypeScript effectively overlays a "mirrored" resolution process to find files it's interested in. Specifically:

  • When resolving files without file extensions, in addition to .js files, TypeScript first searches for .ts, .tsx, and .d.ts files.
  • When resolving foo from node_modules, in addition to */node_modules/foo, TypeScript resolves from */node_modules/@types/foo
  • When resolving from package.json, in addition to resolving from the main field, TypeScript resolves from the types field (as well as the typings field)
    • In TypeScript 3.1, we will have something for version selection called typesVersions.

So even if TypeScript had some sort of resolution support to plug into Yarn PnP, PnP itself is oblivious to what TypeScript is actually trying to search for.

Arbitrary Code Execution

I think @liftM covered some of this already on #101 (comment), but I think another broadly-applicable motivating scenario would be helpful here.

Let's say we were able to resolve this issue, and that language services that power editors were able to require or start up a server from the .pnp.js file. Ideally, this file just runs, users are happy, and they go about their lives.

Now imagine someone places a .pnp.js file at the root of a repository, clones it, and opens an editor there. That editor now either has the choice of executing arbitrary code or turning itself into a security notification carnival. The former is obviously not ideal, and the latter leads to a very undesirable user experience. Users either hit accept anyway, or the editor experience shuts down entirely.

Bifurcation

Much as these ideas are great, npm's simultaneous effort on tink (formerly crux) means that there are potentially two approaches for us to support which isn't ideal.

Ideas & Mitigations

I think that @sokra had some good insights over Twitter.

  • A static file is definitely easier to analyze and doesn't introduce arbitrary code execution problems.
  • The complete mapping of files in packages makes it fully possible for any tool to overlay its own resolution process which is crucial to TypeScript.
  • Having a tool that can actually be imported or spawned as a server/daemon to understand this new feature is absolutely a great idea to ensure tools like Flow can still work without a full reimplementation.
  • Collaborating with npm and leveraging .package-map.json format could produce some of the aforementioned wins while avoiding bifurcation.

@arcanis
Copy link
Member Author

arcanis commented Sep 18, 2018

So even if TypeScript had some sort of resolution support to plug into Yarn PnP, PnP itself is oblivious to what TypeScript is actually trying to search for.

The Plug'n'Play API is split in two parts: resolveToUnqualified, and resolveUnqualified. The first one is the static resolution that converts lodash/foo into /path/to/cache/lodash-1.2.3/foo. The second one is the one that converts /path/to/cache/lodash-1.2.3/foo into /path/to/cache/lodash-1.2.3/foo/index.js.

So in your case, your resolver would just have to use resolveToUnqualified in order to get the basic path, which you would then continue resolving as you see fit. You can see an example on the webpack resolver, which uses resolveToUnqualified then defers to the rest of the enhanced-resolve plugins to compute the rest of the resolution.

Now imagine someone places a .pnp.js file at the root of a repository, clones it, and opens an editor there.

It's a great point, I didn't consider it before. It's worth noting that this problem already occurs now, though: while not native to the editor, various extensions already transparently execute Javascript files obtained from freshly cloned projects - a classic example being the eslint plugin for vscode.

@catamphetamine
Copy link

The serif font on the paper makes it unreadable

@kitze
Copy link

kitze commented Oct 2, 2018

Someone gets rid of node_modules, the largest folder in your project, your computer, your neighborhood, and the universe, and YET, someone complains about A FUCKING FONT. That's our community ladies and gentleman.

If this was written in 7px Comic Sans it would still be a better read than most of the stuff you find online nowadays.

@arcanis
Copy link
Member Author

arcanis commented Oct 2, 2018

I appreciate the support, but please let's keep it civil. Everyone is entitled to their opinions, especially on a process that specifically asks for feedbacks - 'voting' with your reactions is the best way to signal whether you agree with them or not 🙂

@ChuksFestus
Copy link

ok i don't know what i did wrong but i still get node_modules :(
i ran npx create-react-app myapp --use-pnp still got node_modules

@levrik
Copy link

levrik commented Oct 5, 2018

@ChuksFestus It probably doesn't work because npx is a npm command, not a yarn one.
Try yarn create react-app myapp --use-pnp. Maybe that works. But not sure if --use-pnp works on the create command at all. Probably try to use it with the install command.

Edit:
Ah. Looks like create-react-app uses yarn if installed always for app dependencies. According to facebook/create-react-app#5255 (comment) you need to have Yarn version v1.13.0-20181002.2034 installed.

@ChuksFestus
Copy link

@levrik you can't really do yarn create-react-app but anyways create-react-app uses yarn by default if its installed so when i ran npx create-react-app it ran yarn

@arcanis
Copy link
Member Author

arcanis commented Oct 5, 2018

Hey @ChuksFestus, can you open an issue on the Yarn repo if you still have an issue? 🙂

It's likely caused by a wrong version of Yarn being used, and I prefer to keep this thread focused on the design of the feature, keeping the implementation out of the RFC repository. Thanks!

@arcanis
Copy link
Member Author

arcanis commented Oct 5, 2018

And actually, since the feature has already landed in Yarn 1.12 (currently in RC), I'm going to go ahead and merge this PR 🎊

Thanks everyone for the feedback you provided, this is but the beginning and we'll keep working with you to improve Plug'n'Play and make it a rock-solid foundation for your projects 👊

In case you have any question, feel free to open an issue on the Yarn repo or to ping me on Twitter and I'll make sure to come back to you. Cheers!

(interestingly, github seems to have merged the commit but somehow forgot to mark the PR as such... So I'll close it manually, but you can find the document merged here)

@afzalah

This comment has been minimized.

@jsg2021

This comment has been minimized.

@atishaybaid

This comment has been minimized.

@orta
Copy link

orta commented Oct 9, 2018

Please don't use this thread for support requests, use the yarn repo or stack overflow. Notifications go out to 45+ people who were giving feedback on the RFC and 60+ folks interested in RFCs for yarn.

@yarnpkg yarnpkg locked as resolved and limited conversation to collaborators Oct 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.