Herein globbing refers to using filesystem information (like which files have
.cxx
extension) to configure targets and other project properties,
as opposed to explicitly listing each target's sources in CMakeLists.txt
.
Discussion of globbing in a CMake project must begin with discussion of the :cmake:`admonition against doing so <command/file.html#glob>` in CMake's own documentation (:ref:`skip to usage <glob-function>`). The main reasons cited for avoiding globs are:
- Not all generators support glob-dependent reconfiguration.
- There may be files which match a glob unintentionally (for example temporary files generated by a tool) which pessimize or invalidate build configuration.
- If there are globs configuration depends on then each build must check that those globs have not changed, which introduces overhead.
If it is necessary for a project to support all generators or to enable usage of tools which introduce spurious glob matches, then globbing is not an option. There is no decision on which workflows to support which is correct for all projects, so I think a blanket prohibition against a technique is less beneficial than a description of its relative merits.
In Maud
's case the C++20 modular structure is central and every generator which
:cmake:`supports C++20 modules <manual/cmake-cxxmodules.7.html#generator-support>`
also supports glob-dependent reconfiguration, so avoiding globs would not expand
Maud
's generator support.
As for tools which touch the source tree: even in projects where globbing is not used I frequently have multiple worktrees associated with the repository to isolate those tools from (for example) a build which I don't want to invalidate. Perhaps some would find this unacceptably inelegant.
One of the project tests is a benchmark of globbing overhead. On my machine, the output looks like:
$ ./test_.project --gtest_filter=*bench* | grep -E "^BENCHMARK" -A 10 -B 0
BENCHMARK
-- Writing: ( mean=3602.278 min=3381.849 ) ms
-- New checking: ( mean=848.554 min=826.814 ) ms
-- Globbing: ( mean=929.903 min=909.522 ) ms
-- Globbing(fd): ( mean=299.474 min=292.794 ) ms
-- Globbing(git): ( mean=311.838 min=305.374 ) ms
-- Filtering: ( mean=87.323 min=85.166 ) ms
-- Loading the cache: ( mean=24.618 min=22.662 ) ms
--
8 iterations with 160000 files
(Parameters chosen to approximate the llvm-project repository at the time of
writing in number of files and directory depth (median=4).) Writing
serves
as a baseline of the filesystem's speed: a simulated project with 160,000 empty
files is generated, which takes a few seconds. New checking
is another
useful baseline: accessing the mtime
of every file takes a little less than
a second.
The benchmark's Globbing
result shows that using
:cmake:`file(GLOB_RECURSE) <command/file.html#glob-recurse>` to list all files
and directories in the simulated project also takes a little less than a second.
(Unless we delegate to a dedicated globbing utility as in Globbing(*)
, which
can reduce that time significantly for large projects.)
Maud
's globbing aggressively caches results, filtering from those cached results
on each new glob. This means the overhead of actual filesystem access is only paid once
per rebuild; each new glob incurs less than a tenth of that overhead.
Loading the cache
is also once-per-build overhead. Maud
stores glob results
in ${CMAKE_BINARY_DIR}/CMakeCache.txt
, which must be loaded in the CMake scripts
which verify globs have not changed.
In testing on multiple machines and simulated project sizes, Globbing
overhead
remains comparable to New checking
. The latter is an unavoidable once-per-build
overhead even if globbing is not used, since each source file's mtime
must be
checked to determine if it must be recompiled. To me, adding this overhead again
seems acceptable. There may be projects where that added overhead is unacceptable;
in that case, I'm glad this benchmark was useful to decide that quantitatively...
but I'd be more glad of a PR to increase Maud
's globbing performance.
glob(
name
[CONFIGURE_DEPENDS]
[EXCLUDE_RENDERED]
< inclusion_regex | ! exclusion_regex >...
)
Declare a glob. A list will be stored in a CACHE
variable with the provided
name
containing the absolute path of matching files and directories.
All files in ${CMAKE_SOURCE_DIR}
as well as generated files in
${MAUD_DIR}/rendered
are examined for inclusion in the glob. Files and
directories whose name begins with .
are excluded from all globs.
Glob results are updated as part of the main build system check target, so during
reconfiguration calls to glob()
are a no-op (because the CACHE
variable
is already up-to-date). Scripts which load the cache can access the
variable normally.
CONFIGURE_DEPENDS
- If this flag is specified then in addition to updating the glob's results the check target will trigger regeneration if the results change.
EXCLUDE_RENDERED
- Generated files will be ignored if this flag is specified.
< inclusion_regex | ! exclusion_regex >...
Each pattern is a :cmake:`REGEX <command/string.html#regex-specification>` which is applied to each candidate file's path. Patterns are applied to relative paths; either the component relative to
${CMAKE_SOURCE_DIR}
or relative to${MAUD_DIR}/rendered
if generated.Patterns are evaluated in series, starting with an empty result set. Inclusion patterns are applied to all files and any matches are added to the result set. Exclusion patterns are applied to the result set and any matches are removed. So for example
[.](cxx|hxx) !(^|/)_ !thirdparty
would includehello.cxx, hello.hxx
but would exclude_disabled.cxx
and any files inworld_thirdparty/
.
By default the extensions used to identify C++ source files are
.cxx .cxxm .ixx .mxx .cpp .cppm .cc .ccm .c++ .c++m
.
These can be customized by setting the variable MAUD_CXX_SOURCE_EXTENSIONS
.
Directories and files whose names start with .
are excluded from all globs.
Maud
names build directories .build/
by default to ensure that they are
excluded from globs in the common case where the build directory is nested in
the source root. Maud
relies on build directory files being excluded from
globs of source files, so if a non-default build directory name is used then
things may break.