Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression on every command invocation #3621

Closed
tonistiigi opened this issue May 18, 2022 · 9 comments · Fixed by #4129
Closed

Performance regression on every command invocation #3621

tonistiigi opened this issue May 18, 2022 · 9 comments · Fixed by #4129

Comments

@tonistiigi
Copy link
Member

cbec75e2f3f5afd3334db693829a6b150612076c is the first bad commit
commit cbec75e2f3f5afd3334db693829a6b150612076c
Author: Nicolas De Loof <[email protected]>
Date:   Wed Mar 30 15:27:25 2022 +0200

    Adopt Cobra completion v2 to support completion by CLI plugins

    Signed-off-by: Nicolas De Loof <[email protected]>

 cli-plugins/manager/cobra.go        | 48 +++++++++++++++---
 cli-plugins/plugin/plugin.go        |  5 ++
...

After #3429 every command invocation takes ~200ms longer. Even when command just redirects to another process.

Before:

 # time /tmp/docker buildx version
github.com/docker/buildx v0.8.0-92-g908ce2d2.m 908ce2d20618a069ddd06fc4f044505899932fe2.m
/tmp/docker buildx version  0.06s user 0.04s system 98% cpu 0.098 total
 # time /tmp/docker buildx version
github.com/docker/buildx v0.8.0-92-g908ce2d2.m 908ce2d20618a069ddd06fc4f044505899932fe2.m
/tmp/docker buildx version  0.06s user 0.03s system 99% cpu 0.095 total
 # time /tmp/docker buildx version
github.com/docker/buildx v0.8.0-92-g908ce2d2.m 908ce2d20618a069ddd06fc4f044505899932fe2.m
/tmp/docker buildx version  0.06s user 0.03s system 94% cpu 0.096 total
 # time /tmp/docker --version
Docker version unknown-version, build unknown-commit
/tmp/docker --version  0.02s user 0.01s system 71% cpu 0.038 total
 # time /tmp/docker --version
Docker version unknown-version, build unknown-commit
/tmp/docker --version  0.02s user 0.01s system 70% cpu 0.036 total

After:

github.com/docker/buildx v0.8.0-92-g908ce2d2.m 908ce2d20618a069ddd06fc4f044505899932fe2.m
/tmp/docker buildx version  0.19s user 0.10s system 89% cpu 0.322 total
 # time /tmp/docker buildx version
github.com/docker/buildx v0.8.0-92-g908ce2d2.m 908ce2d20618a069ddd06fc4f044505899932fe2.m
/tmp/docker buildx version  0.19s user 0.10s system 90% cpu 0.325 total
 # time /tmp/docker buildx version
github.com/docker/buildx v0.8.0-92-g908ce2d2.m 908ce2d20618a069ddd06fc4f044505899932fe2.m
/tmp/docker buildx version  0.18s user 0.10s system 90% cpu 0.312 total
 # time /tmp/docker --version
Docker version unknown-version, build unknown-commit
/tmp/docker --version  0.16s user 0.09s system 89% cpu 0.280 total
 # time /tmp/docker --version
Docker version unknown-version, build unknown-commit
/tmp/docker --version  0.16s user 0.09s system 89% cpu 0.272 total

@ndeloof

@thaJeztah thaJeztah added this to the 22.06.0 milestone May 19, 2022
@thaJeztah
Copy link
Member

Copying some details from a slack conversation;

From @tonistiigi

Basic outline of time spent on CLI redirect(not compose-cli that adds another 100ms):

  • 30 ms sys startup (most likely depends on binary size)
  • 10 ms init functions before main() gets called (there are some for loops in there but didn’t debug specifically)
  • 8-10 ms client.Ping() call from initializeFromClient() that gets called every time, depends on socket connect speed
  • 20-25 ms plugin.Metadata() call. Every plugin gets called twice, with first call going to the metadata command.
  • 20-25 ms actual plugin call

Sometimes initial binary call(either for docker or plugin) is +100ms. Looks like some macos system cache.
Client ping is not needed and could be avoided. Metadata() call could be cached with some refactoring as no need to check every time.

The client ping is quick so not necessarily saying the optimization is worth it. all commands except redirect and --version would actually use this result so would need to run it anyway then. avoiding extra metadata call could be nice though

Regarding the "metadata";

@thaJeztah: the plug-in election / discovery is going through several directories, and executing each plugin to get the metadata; I recall discussing caching the results with Ian when he was implementing it (we probably should) could be a short TTL for the cache (short being relative; could even be "minutes"

@tonistiigi: we could cache the inode+size of the plugin after we have called metadata on it once. next time would just do a stat call and skip metadata if the plugin file has not changed

@thaJeztah: Good idea yes. We'd still need to (periodically) scan if new plugins were installed, or if a plugin was installed in a path with a higher priority, but perhaps that in itself isn't the bottleneck (although it depends on how many files are in those directories I guess)

@tonistiigi: This already gets called on a specific plugin so new ones would still first get the uncached part. the stat calls that look up the plugin from the list of paths should be fast enough and don’t need to be cached.

@crazy-max
Copy link
Member

crazy-max commented May 30, 2022

Also encountered this issue with docker version --format "{{.Client.Version}}":

@thaJeztah
Copy link
Member

@crazy-max do you think it'd be possible to run a git bisect with that information? (guess that might be tricky as "time taken" may not be an easy one to bisect; perhaps possible though with the right threshold 🤔

@crazy-max
Copy link
Member

Here is a quick bench:

goos: linux
goarch: amd64
pkg: test
cpu: AMD Ryzen 9 5950X 16-Core Processor
BenchmarkCliVersion/docker-18.09.9-32                 20         442731054 ns/op
BenchmarkCliVersion/docker-19.03.15-32                20         482028795 ns/op
BenchmarkCliVersion/docker-20.10.0-32                 20         440785819 ns/op
BenchmarkCliVersion/docker-20.10.12-32                20         468994243 ns/op
BenchmarkCliVersion/docker-20.10.16-32                20         427788556 ns/op
BenchmarkCliVersion/docker-22.06.0-20220512-84b86e23-32                       20         260001484 ns/op
BenchmarkCliVersion/docker-22.06.0-20220529-2b4ffb30-32                       20         336723954 ns/op
BenchmarkBuildxVersion/docker-18.09.9-32                                      20         420874150 ns/op
BenchmarkBuildxVersion/docker-19.03.15-32                                     20         507768653 ns/op
BenchmarkBuildxVersion/docker-20.10.0-32                                      20         466623930 ns/op
BenchmarkBuildxVersion/docker-20.10.12-32                                     20         491043282 ns/op
BenchmarkBuildxVersion/docker-20.10.16-32                                     20         460604176 ns/op
BenchmarkBuildxVersion/docker-22.06.0-20220512-84b86e23-32                    20         278913689 ns/op
BenchmarkBuildxVersion/docker-22.06.0-20220529-2b4ffb30-32                    20         356485298 ns/op
PASS
ok      test    122.798s
name                                               time/op
CliVersion/docker-18.09.9-32                       443ms ± 0%
CliVersion/docker-19.03.15-32                      482ms ± 0%
CliVersion/docker-20.10.0-32                       441ms ± 0%
CliVersion/docker-20.10.12-32                      469ms ± 0%
CliVersion/docker-20.10.16-32                      428ms ± 0%
CliVersion/docker-22.06.0-20220512-84b86e23-32     260ms ± 0%
CliVersion/docker-22.06.0-20220529-2b4ffb30-32     337ms ± 0%
BuildxVersion/docker-18.09.9-32                    421ms ± 0%
BuildxVersion/docker-19.03.15-32                   508ms ± 0%
BuildxVersion/docker-20.10.0-32                    467ms ± 0%
BuildxVersion/docker-20.10.12-32                   491ms ± 0%
BuildxVersion/docker-20.10.16-32                   461ms ± 0%
BuildxVersion/docker-22.06.0-20220512-84b86e23-32  279ms ± 0%
BuildxVersion/docker-22.06.0-20220529-2b4ffb30-32  356ms ± 0%

Note the regression after 84b86e2 linked to #3429

@crazy-max
Copy link
Member

crazy-max commented Jun 21, 2022

Not tested but I wonder if completion perf regression being fixed in cobra 1.5.0 could fix this issue: https://github.com/spf13/cobra/releases/tag/v1.5.0

@thaJeztah
Copy link
Member

This still needs to be looked into. I just compared (note: on different machines, so not a "straight" compare) between 23.0.0 and 20.10;

On 20.10:

time docker buildx version
github.com/docker/buildx v0.9.1-docker ed00243a0ce2a0aee75311b06e32d33b44729689

real	0m0.093s
user	0m0.067s
sys	0m0.024s

time docker --version
Docker version 20.10.22, build 3a2c30b

real	0m0.031s
user	0m0.014s
sys	0m0.014s

On 23.0.0:

time docker buildx version
github.com/docker/buildx v0.10.0 8764628

real	0m0.168s
user	0m0.118s
sys	0m0.049s

time docker --version
Docker version 23.0.0-rc.2, build 257ff41

real	0m0.106s
user	0m0.071s
sys	0m0.035s

Notice that --version also shows the issue, which should help debugging; this PR may be useful as a starting point; I recall when I was working on that PR that there's other code-paths that may be loading / checking for plugins (even though those results may not be needed at that point);

@thaJeztah thaJeztah modified the milestones: 23.0.0, 23.0.1 Feb 2, 2023
@tonistiigi
Copy link
Member Author

tonistiigi commented Feb 2, 2023

Results from hyperfine

20.10.23

 # hyperfine --runs 5 --warmup 2 './docker --version'
Benchmark 1: ./docker --version
  Time (mean ± σ):      17.5 ms ±   1.3 ms    [User: 11.4 ms, System: 4.6 ms]
  Range (min … max):    15.9 ms …  18.8 ms    5 runs

 # hyperfine --runs 5 --warmup 2 './docker buildx version'
Benchmark 1: ./docker buildx version
  Time (mean ± σ):      49.0 ms ±   1.5 ms    [User: 34.8 ms, System: 15.2 ms]
  Range (min … max):    46.9 ms …  50.7 ms    5 runs

23.0.0

 # hyperfine --runs 5 --warmup 2 './docker --version'
Benchmark 1: ./docker --version
  Time (mean ± σ):      76.3 ms ±   1.7 ms    [User: 50.5 ms, System: 28.4 ms]
  Range (min … max):    74.6 ms …  79.0 ms    5 runs

 # hyperfine --runs 5 --warmup 2 './docker buildx version'
Benchmark 1: ./docker buildx version
  Time (mean ± σ):     119.6 ms ±  26.9 ms    [User: 75.9 ms, System: 40.3 ms]
  Range (min … max):   105.4 ms … 167.5 ms    5 runs

@thaJeztah thaJeztah modified the milestones: 23.0.1, 23.0.2 Feb 10, 2023
@thaJeztah thaJeztah modified the milestones: 23.0.2, 23.0.3 Mar 10, 2023
@crazy-max
Copy link
Member

Updated bench results using hyperfine: https://github.com/crazy-max/docker-cli-bench/blob/main/bench.md

@crazy-max
Copy link
Member

Opened several PRs with different approaches to fix this issue:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment