Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance on untagged HEADs #2822

Merged
merged 6 commits into from
Aug 25, 2021
Merged

Conversation

ron-macneil-youi
Copy link
Contributor

Introduction

My previous PR #2800 improved GitVersion performance when HEAD is already tagged as a release. This PR continues where that left off and improves performance when it isn't.

Measurements

A real-world Git repo containing ~20,000 commits and ~5,000 release tags was used to measure progress during this work. Run times cited here and in commit messages are for Debug builds running under a debugger/profiler on my two-year-old Thinkpad. Release builds running standalone are somewhat faster in absolute terms, but the relative performance improvements are the same.

Overall Results

This PR brings GitVersion run times on an untagged HEAD from over 3 minutes down to 20-25 seconds.

Description of Changes

Our starting run time is just over 3 minutes.

First I make IncrementStrategyFinder non-static, make the relevant members non-static, and register it in the IOC container as a singleton following the established pattern. This is so I can add some targeted caching later, also following the established pattern.

Next I add a DictionaryExtensions class following the established pattern, with a GetOrAdd() method to be used later to implement caches.

Next I implement the first optimisation, a cache of commit message analysis results in IncrementStrategyFinder. This is a relatively expensive operation -- up to 4 regex executions -- and is often done repeatedly for the same commits from different code paths higher up the call stack.

This halves our run time down to a minute and a half.

The second optimisation is in RepositoryStore, where I convert an exponential nested loop in GetVersionTagsOnBranch() to a lookup+linear iteration. (This is similar to what was done in my previous PR #2800).

This cuts run time by a further 60-70%, taking us down to around 30 seconds.

The final optimisation is back in IncrementStrategyFinder in GetIntermediateCommits() and is essentially another nested loop conversion, but instead of a lookup directly to the final result(s), it's an index (by arbitrary commit sha) into the (cached) sequence of all commits from start to HEAD, which we then use to yield the subsequence from that commit to HEAD on-the-cheap using an ArraySegment.

This knocks another 30% or so off the run time, bringing us down to a final time of around 20 seconds.

And register it in the IOC container as a singleton, scoping and
bringing its state under control.

In particular this makes assumptions made by current and future caches
more "correct" since they will only live per-execution rather than
surviving across multiple executions within a process (as is the case in
testing and possibly other scenarios) where cache-invalidating factors
like the repository or configuration could vary.
With GetOrAdd() method for implementing caches.
Calculating version increments from commit messages is a relatively
expensive operation, involving up to four regex executions each.  As the
number of commits and tags in repositories grows, the number of
calculations performed grows at a super-linear rate, quickly adding
significantly to calculation run time.

This commit caches the results by commit sha, bringing this growth rate
back down to linear.

On a real-world repo with ~20,000 commits and ~5,000 tags, this halves
the GitVersion runtime from around 3 minutes down to around a minute and
a half.
Replace an exponential commits x tags nested loop with a linear
sha-keyed tag lookup + single loop over commits.

On a real-world repo with ~20,000 commits and ~5,000 tags, this brings
the run time down from around a minute and a half to around 30 seconds.
This optimizes a hot path under BaseVersionCalculator.GetBaseVersion(),
which among other things analyses commit messages on the sequence of
commits from candidate base versions to HEAD.  Candidate base versions
can include all version tags in the repository.

The previous implementation did an iteration of the full sequence of
commits from beginning to HEAD (which can number in the tens of
thousands or more) to get the commit subsequence for each of the
candidate base versions (which can number in the thousands, especially
in continuous deployment situations), so effectively a nested loop,
which quickly reached huge numbers of iterations as the number of
commits and tags in a repo increased.  Even though the commit message
analysis work itself was already being cached and therefore not
repeated, just the iteration itself could take significant amounts of
time.

The new implementation maintains a cache of the full sequence of commits
to HEAD (as the old one did, except as an array and with simpler code)
plus a lookup indexing into that sequence by commit sha.  To get the
subsequence for each base version, we get its position in the full
sequence via the lookup and yield a zero-allocation ArraySegment from
there to the end.  This eliminates the nested loop, reducing the whole
thing to essentially a linear iteration over the base versions (tags).

On a real-world repo with ~20,000 commits and ~5,000 tags, this brings
the run time down from around 30 seconds to around 20 seconds.
@arturcic arturcic requested a review from asbjornu August 24, 2021 15:08
Copy link
Member

@asbjornu asbjornu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wonderful! 🎉

Copy link
Member

@asbjornu asbjornu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏🏼

@asbjornu asbjornu added this to the 5.x milestone Aug 25, 2021
@asbjornu asbjornu merged commit 474dd2d into GitTools:main Aug 25, 2021
@mergify
Copy link
Contributor

mergify bot commented Aug 25, 2021

Thank you @ron-macneil-youi for your contribution!

@arturcic arturcic modified the milestones: 5.x, 5.8.0 Oct 3, 2021
@github-actions
Copy link

🎉 This issue has been resolved in version 5.8.0 🎉
The release is available on:

Your GitReleaseManager bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants