Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modules: significantly improve require performance #25362

Closed
wants to merge 16 commits into from

Conversation

BridgeAR
Copy link
Member

@BridgeAR BridgeAR commented Jan 6, 2019

This is a significant performance boost for require in all cases but mainly for loaded modules. This came up recently as a botteneck for @jasnell and @mcollina as far as I know.

Loading relative or absolute paths now has the same performance profile and loading node modules will be significantly faster than before.

The native startup won't be impacted by this but this should improve the startup time from an actual application due to this performance improvement. It is now also possible to use require lazily without sacrificing performance.

I introduced a new cache which sits on top of the former main cache and caches the filename of already loaded modules. This is then used to check for lazy require calls and for identical require calls from different files in the same directory.

I repaired the stat cache which was broken due to a minor mistake in a former commit. While doing that I just used a hard limit instead of the actual former implementation.

I removed multiple redundant path.resolve calls and refactored code for simplicity and less code branches where possible.

The extensions are now behind a proxy as otherwise it's necessary to recompute them each time a file can not be required without the extension (which is the default for most calls).

The Module._pathCache is now going to cache all requests in combination with the path including failed requests. At the same time I removed the package cache which is now obsolete due to that.

I tried to stay backwards compatible while some improvements required me change DEP0019 (require('.') resolved outside directory) to end-of-life (this was already the case but that was reverted due to a mistake in the implementation). I also slightly changed the _resolveLookupPaths and _resolveFilename implementations. Both got a new argument in 2017 while these have not been used so far in the wild as far as I can tell (I checked Gzemnid and could not find any hits about these arguments). This is clearly semver-major and even though I am able to backports some parts of this, there's always a chance that even different caching might have a slightly different behavior for users than the one before.

I also documented an existing behavior which was not yet documented: placing new entries in the require.cache.

I plan on splitting this up in smaller commits but I wanted to get some general feedback first.

One thing we might want to have a look at again is how much we cache and if we want to clear some caches after each require has fully resolved (including the child require calls) or if we are just fine keeping everything in the cache.

                                                                                   confidence improvement accuracy (*)    (**)   (***)
 module/load-native.js useCache='false' n=1000 path='util'                                ***     10.44 %       ±1.75%  ±2.31%  ±2.99%
 module/load-native.js useCache='false' n=1000 path='vm'                                  ***     10.49 %       ±1.27%  ±1.68%  ±2.17%
 module/load-native.js useCache='true' n=1000 path='util'                                 ***     10.99 %       ±1.12%  ±1.49%  ±1.92%
 module/load-native.js useCache='true' n=1000 path='vm'                                   ***     11.23 %       ±1.18%  ±1.56%  ±2.01%
 module/module-loader-deep.js cache='false' files=1000 ext='.js'                          ***      7.96 %       ±0.75%  ±0.99%  ±1.28%
 module/module-loader-deep.js cache='false' files=1000 ext=''                             ***      8.91 %       ±0.72%  ±0.96%  ±1.24%
 module/module-loader-deep.js cache='true' files=1000 ext='.js'                           ***     71.50 %       ±3.99%  ±5.28%  ±6.80%
 module/module-loader-deep.js cache='true' files=1000 ext=''                              ***     68.78 %       ±2.46%  ±3.25%  ±4.19%
 module/module-loader.js cache='false' n=1 files=500 dir='abs' name=''                    ***      6.34 %       ±0.94%  ±1.24%  ±1.60%
 module/module-loader.js cache='false' n=1 files=500 dir='abs' name='/'                   ***      6.81 %       ±0.94%  ±1.24%  ±1.60%
 module/module-loader.js cache='false' n=1 files=500 dir='abs' name='/index.js'           ***      3.22 %       ±1.00%  ±1.32%  ±1.70%
 module/module-loader.js cache='false' n=1 files=500 dir='rel' name=''                    ***     10.32 %       ±0.92%  ±1.21%  ±1.56%
 module/module-loader.js cache='false' n=1 files=500 dir='rel' name='/'                   ***     12.05 %       ±1.19%  ±1.57%  ±2.02%
 module/module-loader.js cache='false' n=1 files=500 dir='rel' name='/index.js'           ***      9.12 %       ±0.93%  ±1.23%  ±1.59%
 module/module-loader.js cache='false' n=1000 files=500 dir='abs' name=''                 ***     64.34 %       ±0.60%  ±0.80%  ±1.03%
 module/module-loader.js cache='false' n=1000 files=500 dir='abs' name='/'                ***     97.44 %       ±0.76%  ±1.01%  ±1.30%
 module/module-loader.js cache='false' n=1000 files=500 dir='abs' name='/index.js'        ***     71.51 %       ±0.70%  ±0.93%  ±1.20%
 module/module-loader.js cache='false' n=1000 files=500 dir='rel' name=''                 ***    413.20 %       ±2.71%  ±3.60%  ±4.68%
 module/module-loader.js cache='false' n=1000 files=500 dir='rel' name='/'                ***    441.79 %       ±1.71%  ±2.26%  ±2.93%
 module/module-loader.js cache='false' n=1000 files=500 dir='rel' name='/index.js'        ***    473.27 %       ±1.50%  ±1.99%  ±2.58%
 module/module-loader.js cache='true' n=1 files=500 dir='abs' name=''                     ***    205.70 %       ±3.58%  ±4.74%  ±6.12%
 module/module-loader.js cache='true' n=1 files=500 dir='abs' name='/'                    ***    112.56 %       ±2.63%  ±3.48%  ±4.48%
 module/module-loader.js cache='true' n=1 files=500 dir='abs' name='/index.js'            ***     98.16 %       ±4.61%  ±6.13%  ±7.96%
 module/module-loader.js cache='true' n=1 files=500 dir='rel' name=''                     ***    297.89 %       ±3.93%  ±5.21%  ±6.75%
 module/module-loader.js cache='true' n=1 files=500 dir='rel' name='/'                    ***    312.46 %       ±4.15%  ±5.50%  ±7.11%
 module/module-loader.js cache='true' n=1 files=500 dir='rel' name='/index.js'            ***    584.59 %      ±10.41% ±13.77% ±17.75%
 module/module-loader.js cache='true' n=1000 files=500 dir='abs' name=''                  ***     88.07 %       ±1.00%  ±1.33%  ±1.72%
 module/module-loader.js cache='true' n=1000 files=500 dir='abs' name='/'                 ***    125.64 %       ±0.65%  ±0.86%  ±1.11%
 module/module-loader.js cache='true' n=1000 files=500 dir='abs' name='/index.js'         ***     89.21 %       ±0.49%  ±0.65%  ±0.84%
 module/module-loader.js cache='true' n=1000 files=500 dir='rel' name=''                  ***    625.42 %       ±2.44%  ±3.23%  ±4.19%
 module/module-loader.js cache='true' n=1000 files=500 dir='rel' name='/'                 ***    612.17 %       ±1.88%  ±2.49%  ±3.23%
 module/module-loader.js cache='true' n=1000 files=500 dir='rel' name='/index.js'         ***    608.03 %       ±1.60%  ±2.13%  ±2.76%
Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@BridgeAR BridgeAR added the semver-major PRs that contain breaking changes and should be released in the next major version. label Jan 6, 2019
@nodejs-github-bot nodejs-github-bot added the repl Issues and PRs related to the REPL subsystem. label Jan 6, 2019
@BridgeAR BridgeAR added module Issues and PRs related to the module subsystem. performance Issues and PRs related to the performance of Node.js. and removed repl Issues and PRs related to the REPL subsystem. labels Jan 6, 2019
1) It adds more benchmark options to properly verify the gains.

This makes sure the benchmark also tests requiring the same module
again instead of only loading each module only once.

2) Remove dead code:

The array check is obsolete as this function will only be called
internally with preprepared data which is always an array.

3) Simpler code

It was possible to use a more direct logic to prevent some branches.

4) Inline try catch

The function is not required anymore, since V8 is able to produce
performant code with it.

5) var -> let / const & less lines

6) Update require.extensions description

The comment was outdated.

7) Improve extension handling

This is a performance optimization to prevent loading the extensions
on each uncached require call. It uses proxies to intercept changes
and receives the necessary informations by doing that.
@BridgeAR BridgeAR force-pushed the require-performance branch from 9576b54 to 9c54e43 Compare January 6, 2019 01:25
// Do not expose this to user land even with --expose-internals.
const loaderId = 'internal/bootstrap/loaders';

// Create a native module map to tell if a module is internal or not.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up: I think this does something similar but conflicts with #25352?

@BridgeAR
Copy link
Member Author

BridgeAR commented Jan 6, 2019

@BridgeAR
Copy link
Member Author

BridgeAR commented Jan 6, 2019

doc/api/modules.md Outdated Show resolved Hide resolved
doc/api/modules.md Outdated Show resolved Hide resolved
@mscdex
Copy link
Contributor

mscdex commented Jan 6, 2019

I also slightly changed the _resolveLookupPaths and _resolveFilename implementations. Both got a new argument in 2017 while these have not been used so far in the wild as far as I can tell (I checked Gzemnid and could not find any hits about these arguments).

When I added that parameter it was because there were Google (IIRC) projects using this function and were expecting the old return value type. To avoid breaking their (and anyone else's) projects, I added the new parameter for backwards compatibility. I do not remember which projects they were, but just wanted to put that out there.

@BridgeAR
Copy link
Member Author

BridgeAR commented Jan 6, 2019

CI https://ci.nodejs.org/job/node-test-pull-request/19953/
CITGM https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/1709/

@mscdex thanks for pointing that out. I'll have a look at the modules again in the next few days. First I want CITGM to be happy and I already stumbled upon two minor things that I just fixed.
In the worst case I'll use a proxy to monitor the internal functions and use a slower fallback with the old behavior if the functions are monkey patched / used.

There are some issues involving the proxy. This tries to circumvent
the issue by not making any copy of the newly set object.
@BridgeAR BridgeAR added the wip Issues and PRs that are still a work in progress. label Mar 26, 2019
@BridgeAR
Copy link
Member Author

BridgeAR commented Mar 26, 2019

I just marked this as work in progress. I keep it open to pull out smaller chunks but I do not plan on pushing everything in one PR.

I already moved out a small part and for v12 I'll only pull out a couple non semver-major changes. All semver-major parts should come up for v13 instead.

BridgeAR added a commit to BridgeAR/node that referenced this pull request Apr 1, 2019
This publicly documents that adding native module names will resolve
the added entry instead of the native module.

It also updates the description why extensions are deprecated.

PR-URL: nodejs#26971
Refs: nodejs#25362
Reviewed-By: Gus Caplan <[email protected]>
Reviewed-By: Vse Mozhet Byt <[email protected]>
BridgeAR added a commit to BridgeAR/node that referenced this pull request Apr 4, 2019
PR-URL: nodejs#26970
Refs: nodejs#25362
Reviewed-By: Guy Bedford <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
BridgeAR added a commit to BridgeAR/node that referenced this pull request Apr 4, 2019
Add more benchmark options to properly verify the gains.

This makes sure the benchmark also tests requiring the same module
again instead of only loading each module only once.

PR-URL: nodejs#26970
Refs: nodejs#25362
Reviewed-By: Guy Bedford <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
BridgeAR added a commit to BridgeAR/node that referenced this pull request Apr 4, 2019
Moving `try / catch` into separate functions is not necessary
anymore due to V8 optimizations.

PR-URL: nodejs#26970
Refs: nodejs#25362
Reviewed-By: Guy Bedford <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
BridgeAR added a commit to BridgeAR/node that referenced this pull request Apr 4, 2019
This adds the `path` property to the module object. It contains the
current directory as path. That is necessary to add an extra caching
layer.

It also makes sure the `id` uses a default in case it's not set.
Otherwise the `path.dirname(id)` command could fail.

PR-URL: nodejs#26970
Refs: nodejs#25362
Reviewed-By: Guy Bedford <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
BridgeAR added a commit to BridgeAR/node that referenced this pull request Apr 4, 2019
This adds an extra modules caching layer that operates on the parent's
`path` property and the current require argument. That together can
be used as unique identifier to speed up loading the same module more
than once. It is a cache on top of the current modules cache.

It has the nice feature that this cache does not only work in the same
file but it works for the whole current directory. So if the same file
is loaded in any other file from the same directory, it will also hit
this cache instead of having to resolve the file again.

To keep it backwards compatible with the old modules cache, it detects
invalidation of that cache.

PR-URL: nodejs#26970
Refs: nodejs#25362
Reviewed-By: Guy Bedford <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
BethGriggs pushed a commit that referenced this pull request Apr 5, 2019
This publicly documents that adding native module names will resolve
the added entry instead of the native module.

It also updates the description why extensions are deprecated.

PR-URL: #26971
Refs: #25362
Reviewed-By: Gus Caplan <[email protected]>
Reviewed-By: Vse Mozhet Byt <[email protected]>
BridgeAR added a commit to BridgeAR/node that referenced this pull request Apr 8, 2019
This removes a lot of code that has no functionality anymore. All
Node.js internal code calls `_resolveLookupPaths` with two arguments.

The code that validates `index.js` is not required at all as we check
for these files anyway, so it's just redundant code that should be
removed.

PR-URL: nodejs#26983
Refs: nodejs#25362
Reviewed-By: Jan Krems <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Guy Bedford <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Michaël Zasso <[email protected]>
BethGriggs pushed a commit that referenced this pull request Apr 9, 2019
This publicly documents that adding native module names will resolve
the added entry instead of the native module.

It also updates the description why extensions are deprecated.

PR-URL: #26971
Refs: #25362
Reviewed-By: Gus Caplan <[email protected]>
Reviewed-By: Vse Mozhet Byt <[email protected]>
Signed-off-by: Beth Griggs <[email protected]>
BethGriggs pushed a commit that referenced this pull request Apr 9, 2019
This publicly documents that adding native module names will resolve
the added entry instead of the native module.

It also updates the description why extensions are deprecated.

PR-URL: #26971
Refs: #25362
Reviewed-By: Gus Caplan <[email protected]>
Reviewed-By: Vse Mozhet Byt <[email protected]>
Signed-off-by: Beth Griggs <[email protected]>
BethGriggs pushed a commit that referenced this pull request Apr 9, 2019
PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Signed-off-by: Beth Griggs <[email protected]>
BethGriggs pushed a commit that referenced this pull request Apr 9, 2019
Add more benchmark options to properly verify the gains.

This makes sure the benchmark also tests requiring the same module
again instead of only loading each module only once.

PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Signed-off-by: Beth Griggs <[email protected]>
BethGriggs pushed a commit that referenced this pull request Apr 9, 2019
Moving `try / catch` into separate functions is not necessary
anymore due to V8 optimizations.

PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Signed-off-by: Beth Griggs <[email protected]>
BethGriggs pushed a commit that referenced this pull request Apr 9, 2019
This adds the `path` property to the module object. It contains the
current directory as path. That is necessary to add an extra caching
layer.

It also makes sure the `id` uses a default in case it's not set.
Otherwise the `path.dirname(id)` command could fail.

PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Signed-off-by: Beth Griggs <[email protected]>
BethGriggs pushed a commit that referenced this pull request Apr 9, 2019
This adds an extra modules caching layer that operates on the parent's
`path` property and the current require argument. That together can
be used as unique identifier to speed up loading the same module more
than once. It is a cache on top of the current modules cache.

It has the nice feature that this cache does not only work in the same
file but it works for the whole current directory. So if the same file
is loaded in any other file from the same directory, it will also hit
this cache instead of having to resolve the file again.

To keep it backwards compatible with the old modules cache, it detects
invalidation of that cache.

PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Signed-off-by: Beth Griggs <[email protected]>
BethGriggs pushed a commit that referenced this pull request Apr 10, 2019
This publicly documents that adding native module names will resolve
the added entry instead of the native module.

It also updates the description why extensions are deprecated.

PR-URL: #26971
Refs: #25362
Reviewed-By: Gus Caplan <[email protected]>
Reviewed-By: Vse Mozhet Byt <[email protected]>
Signed-off-by: Beth Griggs <[email protected]>
BethGriggs pushed a commit that referenced this pull request Apr 10, 2019
PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Signed-off-by: Beth Griggs <[email protected]>
BethGriggs pushed a commit that referenced this pull request Apr 10, 2019
Add more benchmark options to properly verify the gains.

This makes sure the benchmark also tests requiring the same module
again instead of only loading each module only once.

PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Signed-off-by: Beth Griggs <[email protected]>
BethGriggs pushed a commit that referenced this pull request Apr 10, 2019
Moving `try / catch` into separate functions is not necessary
anymore due to V8 optimizations.

PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Signed-off-by: Beth Griggs <[email protected]>
BethGriggs pushed a commit that referenced this pull request Apr 10, 2019
This adds the `path` property to the module object. It contains the
current directory as path. That is necessary to add an extra caching
layer.

It also makes sure the `id` uses a default in case it's not set.
Otherwise the `path.dirname(id)` command could fail.

PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Signed-off-by: Beth Griggs <[email protected]>
BethGriggs pushed a commit that referenced this pull request Apr 10, 2019
This adds an extra modules caching layer that operates on the parent's
`path` property and the current require argument. That together can
be used as unique identifier to speed up loading the same module more
than once. It is a cache on top of the current modules cache.

It has the nice feature that this cache does not only work in the same
file but it works for the whole current directory. So if the same file
is loaded in any other file from the same directory, it will also hit
this cache instead of having to resolve the file again.

To keep it backwards compatible with the old modules cache, it detects
invalidation of that cache.

PR-URL: #26970
Refs: #25362
Reviewed-By: Guy Bedford <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Signed-off-by: Beth Griggs <[email protected]>
@BridgeAR
Copy link
Member Author

Closing due to age and most important improvements already landed.

@BridgeAR BridgeAR closed this Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module Issues and PRs related to the module subsystem. performance Issues and PRs related to the performance of Node.js. semver-major PRs that contain breaking changes and should be released in the next major version. wip Issues and PRs that are still a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants