Improve performance on untagged HEADs #2822

ron-macneil-youi · 2021-08-24T12:48:31Z

Introduction

My previous PR #2800 improved GitVersion performance when HEAD is already tagged as a release. This PR continues where that left off and improves performance when it isn't.

Measurements

A real-world Git repo containing ~20,000 commits and ~5,000 release tags was used to measure progress during this work. Run times cited here and in commit messages are for Debug builds running under a debugger/profiler on my two-year-old Thinkpad. Release builds running standalone are somewhat faster in absolute terms, but the relative performance improvements are the same.

Overall Results

This PR brings GitVersion run times on an untagged HEAD from over 3 minutes down to 20-25 seconds.

Description of Changes

Our starting run time is just over 3 minutes.

First I make IncrementStrategyFinder non-static, make the relevant members non-static, and register it in the IOC container as a singleton following the established pattern. This is so I can add some targeted caching later, also following the established pattern.

Next I add a DictionaryExtensions class following the established pattern, with a GetOrAdd() method to be used later to implement caches.

Next I implement the first optimisation, a cache of commit message analysis results in IncrementStrategyFinder. This is a relatively expensive operation -- up to 4 regex executions -- and is often done repeatedly for the same commits from different code paths higher up the call stack.

This halves our run time down to a minute and a half.

The second optimisation is in RepositoryStore, where I convert an exponential nested loop in GetVersionTagsOnBranch() to a lookup+linear iteration. (This is similar to what was done in my previous PR #2800).

This cuts run time by a further 60-70%, taking us down to around 30 seconds.

The final optimisation is back in IncrementStrategyFinder in GetIntermediateCommits() and is essentially another nested loop conversion, but instead of a lookup directly to the final result(s), it's an index (by arbitrary commit sha) into the (cached) sequence of all commits from start to HEAD, which we then use to yield the subsequence from that commit to HEAD on-the-cheap using an ArraySegment.

This knocks another 30% or so off the run time, bringing us down to a final time of around 20 seconds.

And register it in the IOC container as a singleton, scoping and bringing its state under control. In particular this makes assumptions made by current and future caches more "correct" since they will only live per-execution rather than surviving across multiple executions within a process (as is the case in testing and possibly other scenarios) where cache-invalidating factors like the repository or configuration could vary.

With GetOrAdd() method for implementing caches.

Calculating version increments from commit messages is a relatively expensive operation, involving up to four regex executions each. As the number of commits and tags in repositories grows, the number of calculations performed grows at a super-linear rate, quickly adding significantly to calculation run time. This commit caches the results by commit sha, bringing this growth rate back down to linear. On a real-world repo with ~20,000 commits and ~5,000 tags, this halves the GitVersion runtime from around 3 minutes down to around a minute and a half.

Replace an exponential commits x tags nested loop with a linear sha-keyed tag lookup + single loop over commits. On a real-world repo with ~20,000 commits and ~5,000 tags, this brings the run time down from around a minute and a half to around 30 seconds.

This optimizes a hot path under BaseVersionCalculator.GetBaseVersion(), which among other things analyses commit messages on the sequence of commits from candidate base versions to HEAD. Candidate base versions can include all version tags in the repository. The previous implementation did an iteration of the full sequence of commits from beginning to HEAD (which can number in the tens of thousands or more) to get the commit subsequence for each of the candidate base versions (which can number in the thousands, especially in continuous deployment situations), so effectively a nested loop, which quickly reached huge numbers of iterations as the number of commits and tags in a repo increased. Even though the commit message analysis work itself was already being cached and therefore not repeated, just the iteration itself could take significant amounts of time. The new implementation maintains a cache of the full sequence of commits to HEAD (as the old one did, except as an array and with simpler code) plus a lookup indexing into that sequence by commit sha. To get the subsequence for each base version, we get its position in the full sequence via the lookup and yield a zero-allocation ArraySegment from there to the end. This eliminates the nested loop, reducing the whole thing to essentially a linear iteration over the base versions (tags). On a real-world repo with ~20,000 commits and ~5,000 tags, this brings the run time down from around 30 seconds to around 20 seconds.

asbjornu

This looks wonderful! 🎉

src/GitVersion.Core/VersionCalculation/IncrementStrategyFinder.cs

asbjornu

👏🏼

mergify · 2021-08-25T07:52:55Z

Thank you @ron-macneil-youi for your contribution!

github-actions · 2021-11-13T19:31:55Z

🎉 This issue has been resolved in version 5.8.0 🎉
The release is available on:

Your GitReleaseManager bot 📦🚀

ron-macneil-youi added 5 commits August 23, 2021 18:17

Add DictionaryExtensions

f486cf5

With GetOrAdd() method for implementing caches.

arturcic requested a review from asbjornu August 24, 2021 15:08

asbjornu requested changes Aug 24, 2021

View reviewed changes

src/GitVersion.Core/VersionCalculation/IncrementStrategyFinder.cs Outdated Show resolved Hide resolved

Split CommitMessageIncrementMode into its own file

0745d52

ron-macneil-youi requested a review from asbjornu August 25, 2021 00:26

asbjornu approved these changes Aug 25, 2021

View reviewed changes

asbjornu added this to the 5.x milestone Aug 25, 2021

asbjornu merged commit 474dd2d into GitTools:main Aug 25, 2021

arturcic added the improvement label Oct 3, 2021

arturcic modified the milestones: 5.x, 5.8.0 Oct 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance on untagged HEADs #2822

Improve performance on untagged HEADs #2822

ron-macneil-youi commented Aug 24, 2021

asbjornu left a comment

asbjornu left a comment

mergify bot commented Aug 25, 2021

github-actions bot commented Nov 13, 2021

Improve performance on untagged HEADs #2822

Improve performance on untagged HEADs #2822

Conversation

ron-macneil-youi commented Aug 24, 2021

Introduction

Measurements

Overall Results

Description of Changes

asbjornu left a comment

Choose a reason for hiding this comment

asbjornu left a comment

Choose a reason for hiding this comment

mergify bot commented Aug 25, 2021

github-actions bot commented Nov 13, 2021