Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Jest startup time and test runtime, particularly when running with coverage, by caching micromatch and avoiding recreating RegExp instances #10131

Merged
merged 8 commits into from
Jun 23, 2020

Conversation

lencioni
Copy link
Contributor

@lencioni lencioni commented Jun 5, 2020

I was profiling some Jest runs at Airbnb and noticed that on my
MacBook Pro, we can spend over 2 seconds at Jest startup time in
SearchSource getTestPaths. I believe that this will grow as the size
of the codebase increases.

Looking at the call stacks, it appears to be calling micromatch
repeatedly, which calls picomatch, which builds a regex out of the
globs. It seems that the parsing and regex building also triggers the
garbage collector frequently.

Upon testing, I noticed that the globs don't actually change between
these calls, so we can save a bunch of work by making a micromatch
matcher and reusing that function for all of the paths.

micromatch has some logic internally to handle lists of globs that
may include negated globs. A naive approach of just checking if it
matched any of the globs won't capture that, so I copied and
simplified the logic from within micromatch.

https://github.com/micromatch/micromatch/blob/fe4858b0/index.js#L27-L77

In my profiling of this change locally, this brings down the time of
startRun from about 2000ms to about 200ms.

Before:
image

Before, zoomed in:
image

After:

image


Avoid recreating RegExp instances in regexToMatcher

After optimizing globsToMatcher, I noticed that there was still a
lot of unnecessary overhead at Jest startup time spent recreating
the same RegExp instances repeatedly. Thankfully, we can be a little
smarter about this and create them all ahead of time and just reuse
them.

On top of my globsToMatcher optimization, this brings the speed of
the ArrayMap in startRun down from about 160ms to about 7ms.

Before:
image

After:
image


Optimize micromatch and RegExps in shouldInstrument

I've been profiling running Jest with code coverage at Airbnb, and
noticed that shouldInstrument is called often and is fairly
expensive. It also seems to call micromatch and new RegExp
repeatedly, both of which can be optimized by caching the work to
convert globs and strings into matchers and regexes.

I profiled this change by running a set of 27 fairly simple tests.
Before this change, about 6-7 seconds was spent in shouldInstrument.
After this change, only 400-500 ms is spent there. I would expect
this delta to increase along with the number of tests and size of
their dependency graphs.

A typical shouldInstrument call before this change:
image

A typical shouldInstrument call after this change (most are actually too fast to even show up in the profiler now):
image


I was profiling some Jest runs at Airbnb and noticed that on my
MacBook Pro, we can spend over 30 seconds after running Jest with code
coverage as the coverage reporter adds all of the untested files. I
believe that this will grow as the size of the codebase increases.

Looking at the call stacks, it appears to be calling micromatch
repeatedly, which calls picomatch, which builds a regex out of the
globs. It seems that the parsing and regex building also triggers the
garbage collector frequently.

Since this is in a tight loop and the globs won't change between
checks, we can greatly improve the performance here by using
micromatch.matcher.

This optimization reduces the block of time here from about 30s to
about 10s. The aggregated total time of coverage reporter's
onRunComplete goes from 23s to 600ms.

Before:

image

Before, zoomed in:
image

After:

image

After, zoomed in:
image

Summary

Motivation: Improve slow Jest startup and runtime speed, particularly when running with coverage

Test plan

I ran jest in the Airbnb frontend monorepo with and without coverage options, with a path argument.

Copy link
Collaborator

@thymikee thymikee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

packages/jest-core/src/SearchSource.ts Outdated Show resolved Hide resolved
@lencioni lencioni requested a review from thymikee June 5, 2020 16:27
CHANGELOG.md Outdated Show resolved Hide resolved
packages/jest-core/src/SearchSource.ts Outdated Show resolved Hide resolved
@@ -37,8 +37,55 @@ export type TestSelectionConfig = {
watch?: boolean;
};

const globsToMatcher = (globs: Array<Config.Glob>) => (path: Config.Path) =>
micromatch([replacePathSepForGlob(path)], globs, {dot: true}).length > 0;
const globsMatchers = new Map<
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this work with watch mode, and changing the globs? I guess since the key is the glob itself and we don't collect all entries in the Map we won't pull stale data out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's my understanding!

@lencioni lencioni requested a review from SimenB June 5, 2020 17:10
@SimenB
Copy link
Member

SimenB commented Jun 5, 2020

@jonschlinkert hiya, thoughts on this?

@SimenB SimenB requested review from jeysal and cpojer June 5, 2020 18:49
I was profiling some Jest runs at Airbnb and noticed that on my
MacBook Pro, we can spend over 2 seconds at Jest startup time in
SearchSource getTestPaths. I believe that this will grow as the size
of the codebase increases.

Looking at the call stacks, it appears to be calling micromatch
repeatedly, which calls picomatch, which builds a regex out of the
globs. It seems that the parsing and regex building also triggers the
garbage collector frequently.

Upon testing, I noticed that the globs don't actually change between
these calls, so we can save a bunch of work by making a micromatch
matcher and reusing that function for all of the paths.

micromatch has some logic internally to handle lists of globs that
may include negated globs. A naive approach of just checking if it
matched any of the globs won't capture that, so I copied and
simplified the logic from within micromatch.

https://github.com/micromatch/micromatch/blob/fe4858b0/index.js#L27-L77

In my profiling of this change locally, this brings down the time of
startRun from about 2000ms to about 200ms.
@lencioni lencioni changed the title Cache micromatch in SearchSource globsToMatcher Cache micromatch in SearchSource globsToMatcher and avoid recreating RegExp instances in regexToMatcher Jun 8, 2020
@lencioni
Copy link
Contributor Author

lencioni commented Jun 8, 2020

To avoid having to stack a bunch of separate PRs, I've added another commit to this one to optimize the startup even more.

After optimizing globsToMatcher, I noticed that there was still a
lot of unnecessary overhead at Jest startup time spent recreating
the same RegExp instances repeatedly. Thankfully, we can be a little
smarter about this and create them all ahead of time and just reuse
them.

On top of my globsToMatcher optimization, this brings the speed of
the ArrayMap in startRun down from about 160ms to about 7ms.
@lencioni lencioni changed the title Cache micromatch in SearchSource globsToMatcher and avoid recreating RegExp instances in regexToMatcher Improve Jest startup time and test runtime, particularly when running with coverage, by caching micromatch and avoiding recreating RegExp instances Jun 8, 2020
@lencioni
Copy link
Contributor Author

lencioni commented Jun 8, 2020

Alright friends, I think I'm done hacking on this now. Please let me know if you'd like me to make any changes. I'm really looking forward to seeing how this performs for us in CI!

packages/jest-transform/src/shouldInstrument.ts Outdated Show resolved Hide resolved
packages/jest-util/package.json Show resolved Hide resolved
globs: Array<Config.Glob>,
): (path: Config.Path) => boolean {
if (globs.length === 0) {
return (_: Config.Path): boolean => false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these type annotations needed? It's not inferred from the outer function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't expect them to be needed, but vscode showed me an eslint error when I didn't have them:

Missing return type on function. eslint@typescript-eslint/explicit-module-boundary-types

Copy link
Member

@SimenB SimenB Jun 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, wonder if that's fixed by the fresh 3.2 release? Either typescript-eslint/typescript-eslint#2169 or typescript-eslint/typescript-eslint#2176

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seems promising. It looks like Jest is still on v2.30 though, so I think I'd like to say that updating the eslint plugin is out of scope for this PR if that's alright with you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the breaking changes seem to be pretty okay for this repo: https://github.com/typescript-eslint/typescript-eslint/releases/tag/v3.0.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's out of scope for this 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SimenB I'm happy to tackle that upgrade, if you want to create an issue and assign it to me :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @G-Rath! #10177

lencioni added 4 commits June 8, 2020 11:37
I'd like to start using this in more places to improve performance.
Moving it to jest-util seems like a better spot. Now that it is a
standalone module, I decided to write some unit tests for this
function. In doing so, I uncovered a small difference between the
behavior of this function and micromatch when overlapping glob
patterns are used which I also fixed.
While incorporating this function into more places, I discovered a
discrepancy here with how micromatch works. We can fix this by
creating a fast path for when there are no globs at all.
I've been profiling running Jest with code coverage at Airbnb, and
noticed that shouldInstrument is called often and is fairly
expensive. It also seems to call micromatch and `new RegExp`
repeatedly, both of which can be optimized by caching the work to
convert globs and strings into matchers and regexes.

I profiled this change by running a set of 27 fairly simple tests.
Before this change, about 6-7 seconds was spent in shouldInstrument.
After this change, only 400-500 ms is spent there. I would expect
this delta to increase along with the number of tests and size of
their dependency graphs.
I was profiling some Jest runs at Airbnb and noticed that on my
MacBook Pro, we can spend over 30 seconds after running Jest with code
coverage as the coverage reporter adds all of the untested files. I
believe that this will grow as the size of the codebase increases.

Looking at the call stacks, it appears to be calling micromatch
repeatedly, which calls picomatch, which builds a regex out of the
globs. It seems that the parsing and regex building also triggers the
garbage collector frequently.

Since this is in a tight loop and the globs won't change between
checks, we can greatly improve the performance here by using our new
and optimized globsToMatcher function, which avoids re-parsing globs
unnecessarily.

This optimization reduces the block of time here from about 30s to
about 10s. The aggregated total time of coverage reporter's
onRunComplete goes from 23s to 600ms.
[replacePathSepForGlob(path.relative(config.rootDir, filename))],
options.collectCoverageFrom,
).length === 0
!globsToMatcher(options.collectCoverageFrom)(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that this will end up using the dot: true option where before that wasn't used here (and above in this file). This seems like it should be fine, but I don't really have enough context here to know for sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, seems fine. HasteMap should filter out any dotfiles we don't want already (I think)

}
>();

// Every time micromatch is called, it will parse the glob strings and turn them
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the windows builds was flaking and there's no way to re-run, so I decided to add some code comments to this file.

@AdaskoTheBeAsT
Copy link

amazing job @lencioni :)
I am just curious how this will speed up tests in our project - on my win machine it takes 45s just to run first test from pack of few hundreds - looking forward for new version to test :)

@SimenB
Copy link
Member

SimenB commented Jun 8, 2020

This LGTM, I just want some more people to review it before merging

The logic here might be a little confusing, so I am adding some
comments that I hope will help make it easier for future explorers to
understand. While I was doing this, I noticed a small way to simplify
this function even more.
@AdaskoTheBeAsT
Copy link

@cpojer @thymikee @jeysal can you please review this pull request? :) whole community keep fingers crossed for increased performance :)

@7rulnik 7rulnik mentioned this pull request Jun 20, 2020
@7rulnik
Copy link

7rulnik commented Jun 22, 2020

Tested this branch on my company project: duration decreased from 11 minutes to 4 minutes. Seems that it is one of the main reasons for the #9457 (comment) regression. @lencioni awesome job, thanks!

@csvan
Copy link

csvan commented Jun 22, 2020

@cpojer @thymikee @jeysal the sanity of a million JS developers is in your hands!

Copy link
Member

@cpojer cpojer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

@cpojer cpojer merged commit 504cace into jestjs:master Jun 23, 2020
@SimenB
Copy link
Member

SimenB commented Jun 23, 2020

Thanks @lencioni! Published as 26.1.0

@lencioni
Copy link
Contributor Author

In CI we currently spread our tests across 10 separate machines.

When running all tests, the 10 machines on 26.0.1 had runtimes of:

  • 10m 28s
  • 9m 47s
  • 10m 11s
  • 9m 59s
  • 10m 9s
  • 11m 8s
  • 11m 26s
  • 9m 53s
  • 10m 33s
  • 13m 39s

On 26.1.0:

  • 8m 8s
  • 10m 0s
  • 9m 28s
  • 9m 36s
  • 9m 8s
  • 9m 50s
  • 10m 41s
  • 9m 21s
  • 10m 8s
  • 9m 54s

n=1

Not as much as I had wished for, but this is definitely a nice win for us! Thank you for the reviews and the release!

@lencioni lencioni deleted the searchsource-micromatch branch June 23, 2020 17:03
sauravhiremath pushed a commit to MLH-Fellowship/jest that referenced this pull request Jul 20, 2020
… with coverage, by caching micromatch and avoiding recreating RegExp instances (jestjs#10131)

Co-authored-by: Christoph Nakazawa <[email protected]>
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants