-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance on untagged HEADs #2822
Conversation
And register it in the IOC container as a singleton, scoping and bringing its state under control. In particular this makes assumptions made by current and future caches more "correct" since they will only live per-execution rather than surviving across multiple executions within a process (as is the case in testing and possibly other scenarios) where cache-invalidating factors like the repository or configuration could vary.
With GetOrAdd() method for implementing caches.
Calculating version increments from commit messages is a relatively expensive operation, involving up to four regex executions each. As the number of commits and tags in repositories grows, the number of calculations performed grows at a super-linear rate, quickly adding significantly to calculation run time. This commit caches the results by commit sha, bringing this growth rate back down to linear. On a real-world repo with ~20,000 commits and ~5,000 tags, this halves the GitVersion runtime from around 3 minutes down to around a minute and a half.
Replace an exponential commits x tags nested loop with a linear sha-keyed tag lookup + single loop over commits. On a real-world repo with ~20,000 commits and ~5,000 tags, this brings the run time down from around a minute and a half to around 30 seconds.
This optimizes a hot path under BaseVersionCalculator.GetBaseVersion(), which among other things analyses commit messages on the sequence of commits from candidate base versions to HEAD. Candidate base versions can include all version tags in the repository. The previous implementation did an iteration of the full sequence of commits from beginning to HEAD (which can number in the tens of thousands or more) to get the commit subsequence for each of the candidate base versions (which can number in the thousands, especially in continuous deployment situations), so effectively a nested loop, which quickly reached huge numbers of iterations as the number of commits and tags in a repo increased. Even though the commit message analysis work itself was already being cached and therefore not repeated, just the iteration itself could take significant amounts of time. The new implementation maintains a cache of the full sequence of commits to HEAD (as the old one did, except as an array and with simpler code) plus a lookup indexing into that sequence by commit sha. To get the subsequence for each base version, we get its position in the full sequence via the lookup and yield a zero-allocation ArraySegment from there to the end. This eliminates the nested loop, reducing the whole thing to essentially a linear iteration over the base versions (tags). On a real-world repo with ~20,000 commits and ~5,000 tags, this brings the run time down from around 30 seconds to around 20 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wonderful! 🎉
src/GitVersion.Core/VersionCalculation/IncrementStrategyFinder.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏🏼
Thank you @ron-macneil-youi for your contribution! |
🎉 This issue has been resolved in version 5.8.0 🎉 Your GitReleaseManager bot 📦🚀 |
Introduction
My previous PR #2800 improved GitVersion performance when HEAD is already tagged as a release. This PR continues where that left off and improves performance when it isn't.
Measurements
A real-world Git repo containing ~20,000 commits and ~5,000 release tags was used to measure progress during this work. Run times cited here and in commit messages are for Debug builds running under a debugger/profiler on my two-year-old Thinkpad. Release builds running standalone are somewhat faster in absolute terms, but the relative performance improvements are the same.
Overall Results
This PR brings GitVersion run times on an untagged HEAD from over 3 minutes down to 20-25 seconds.
Description of Changes
Our starting run time is just over 3 minutes.
First I make
IncrementStrategyFinder
non-static, make the relevant members non-static, and register it in the IOC container as a singleton following the established pattern. This is so I can add some targeted caching later, also following the established pattern.Next I add a
DictionaryExtensions
class following the established pattern, with aGetOrAdd()
method to be used later to implement caches.Next I implement the first optimisation, a cache of commit message analysis results in
IncrementStrategyFinder
. This is a relatively expensive operation -- up to 4 regex executions -- and is often done repeatedly for the same commits from different code paths higher up the call stack.This halves our run time down to a minute and a half.
The second optimisation is in
RepositoryStore
, where I convert an exponential nested loop inGetVersionTagsOnBranch()
to a lookup+linear iteration. (This is similar to what was done in my previous PR #2800).This cuts run time by a further 60-70%, taking us down to around 30 seconds.
The final optimisation is back in
IncrementStrategyFinder
inGetIntermediateCommits()
and is essentially another nested loop conversion, but instead of a lookup directly to the final result(s), it's an index (by arbitrary commit sha) into the (cached) sequence of all commits from start to HEAD, which we then use to yield the subsequence from that commit to HEAD on-the-cheap using anArraySegment
.This knocks another 30% or so off the run time, bringing us down to a final time of around 20 seconds.