Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fd performs worse than find on NFS #432

Closed
sbocq opened this issue Apr 22, 2019 · 7 comments
Closed

fd performs worse than find on NFS #432

sbocq opened this issue Apr 22, 2019 · 7 comments

Comments

@sbocq
Copy link

sbocq commented Apr 22, 2019

I have a script that scrubs broken transfer files on a live FTP stored on a NFS share using find like this:

find /mnt/ftp/home/redacted ! -newermt "1 day ago" -type f -name '.in.*' -or -name '*.part'

As this can take quite some time I decided to put fd to the test. Unfortunately, its performance is worse on my use case (I hope I didn't make any mistake in the translation) and I think it might be amplified by NFS, which makes some system calls more expensive than others.

Here is a first comparison on CentOS 7 using fd-v7.3.0-x86_64-unknown-linux-musl.tar.gz:

$ time find /mnt/ftp/home/redacted ! -newermt "1 day ago" -type f -name '.in.*' -or -name '*.part'

real	0m0.259s
user	0m0.202s
sys	0m0.032s

$ time ./fd -H '^\.in|\.part$' -s --type f --changed-before 1day /mnt/ftp/home/redacted

real	0m9.970s
user	0m3.297s
sys	0m9.209s

I tried with the options -j 1 a while later to be more equivalent to find, which is single threaded, but it didn't help:

$ time find /mnt/ftp/home/redacted ! -newermt "1 day ago" -type f -name '.in.*' -or -name '*.part'

real	2m59.994s
user	0m1.806s
sys	0m11.001s

$ time ./fd -H '^\.in|\.part$' -j 1 -s --changed-before 1day /mnt/ftp/home/redacted

real	6m16.614s
user	0m2.533s
sys	0m26.904s

As you can see, benchmarking on NFS is a bit wild depending on if metadata is cached or not on the client or server side. But I went on collecting more evidence using strace with and without tracing the calls to futex:

$ strace -c -f ./fd -H '^\.in|\.part$' -j 1 -s --changed-before 1day /mnt/ftp/home/redacted
strace: Process 9451 attached
strace: Process 9452 attached
strace: Process 9454 attached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 72.60   80.094044    20023511         4           futex
 17.18   18.957540          81    234691        55 stat
  7.56    8.343416        2526      3303           getdents64
  2.52    2.783272         113     24661           lstat
  0.11    0.116978          29      4072           brk
  0.02    0.026199         105       250       202 open
  0.00    0.002726         101        27           madvise
  0.00    0.001306          27        48           close
  0.00    0.001126          24        47           fcntl
  0.00    0.000906         227         4           rt_sigprocmask
  0.00    0.000471         236         2           readlink
  0.00    0.000383          35        11           sigaltstack
  0.00    0.000300          38         8           mmap
  0.00    0.000223          45         5           munmap
  0.00    0.000157          52         3           clone
  0.00    0.000105          26         4           rt_sigaction
  0.00    0.000080          27         3           mprotect
  0.00    0.000061          31         2           getrandom
  0.00    0.000034          34         1           arch_prctl
  0.00    0.000033          17         2           fstat
  0.00    0.000033          33         1           set_tid_address
  0.00    0.000026          26         1           execve
  0.00    0.000025          25         1           pipe2
  0.00    0.000021          21         1           ioctl
------ ----------- ----------- --------- --------- ----------------
100.00  110.329465                267152       257 total


$ strace -e 'trace=!futex' -c -f ./fd -H '^\.in|\.part$' -j 1 -s --changed-before 1day /mnt/ftp/home/redacted
strace: Process 82506 attached
strace: Process 82507 attached
strace: Process 82508 attached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 65.70   20.990692          89    234691        55 stat
 24.75    7.906821        2394      3303           getdents64
  9.02    2.880507         138     20801           lstat
  0.42    0.134364          33      4072           brk
  0.10    0.031981         128       250       202 open
  0.01    0.002145          98        22           madvise
  0.01    0.001654          34        48           close
  0.00    0.001260          27        47           fcntl
  0.00    0.000254          51         5           munmap
  0.00    0.000061           6        11           sigaltstack
  0.00    0.000047          12         4           rt_sigprocmask
  0.00    0.000026          13         2           getrandom
  0.00    0.000000           0         2           fstat
  0.00    0.000000           0         8           mmap
  0.00    0.000000           0         3           mprotect
  0.00    0.000000           0         4           rt_sigaction
  0.00    0.000000           0         1           ioctl
  0.00    0.000000           0         3           clone
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2           readlink
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           pipe2
------ ----------- ----------- --------- --------- ----------------
100.00   31.949812                263283       257 total


$ strace -c -f find /mnt/ftp/home/redacted ! -newermt "1 day ago" -type f -name '.in.*' -or -name '*.part'
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 96.94   13.922726         181     76956           newfstatat
  2.58    0.370231        1351       274           getdents
  0.21    0.030305          46       659           brk
  0.10    0.013801          34       405           fcntl
  0.07    0.010084         219        46           openat
  0.06    0.008223          44       187           close
  0.01    0.001921          33        58           fstat
  0.01    0.001275          80        16         4 open
  0.01    0.001146          46        25           mmap
  0.01    0.000731          52        14           mprotect
  0.00    0.000526          53        10           read
  0.00    0.000264          44         6           stat
  0.00    0.000194          65         3           munmap
  0.00    0.000181          91         2           statfs
  0.00    0.000132          66         2         1 access
  0.00    0.000111          56         2           ioctl
  0.00    0.000100         100         1           execve
  0.00    0.000056          28         2           rt_sigaction
  0.00    0.000040          40         1           uname
  0.00    0.000038          38         1           set_tid_address
  0.00    0.000033          33         1           lseek
  0.00    0.000033          33         1           futex
  0.00    0.000030          30         1           getrlimit
  0.00    0.000029          29         1           rt_sigprocmask
  0.00    0.000026          26         1           set_robust_list
  0.00    0.000022          22         1           arch_prctl
  0.00    0.000000           0         1           fchdir
  0.00    0.000000           0         1           fstatfs
------ ----------- ----------- --------- --------- ----------------
100.00   14.362258                 78678         5 total

The calls to futex were a bit unexpected since fd is invoked with -j 1 and it looks like it has quite some incidence on the timings. Otherwise, I think that find behaves smarter here 1) by eliminating some calls to stat while filtering first on the names returned by getdents and then 2) by using a better allocation strategy that reduces the calls to brk and getdents`.

@sbocq
Copy link
Author

sbocq commented Apr 23, 2019

It seems that even find is not optimizing as much after all, and indeed strace was still reporting 76956 calls to stat, while oldfind is implementing the algo that minimizes the calls to stat. This one runs consistently below half a minute in our benchmark.

$ time oldfind /mnt/ftp/home/redacted ! -newermt "1 day ago" -type f -name '.in.*' -or -name '*.part'

real	0m14.008s
user	0m0.317s
sys	0m0.395s

2 ~ $ strace -c oldfind /mnt/ftp/home/redacted ! -newermt "1 day ago" -type f -name '.in.*' -or -name '*.part'
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 85.85    0.141244         514       275           getdents
  3.51    0.005771          53       109         4 open
  3.03    0.004979         108        46           openat
  2.24    0.003691          24       154           close
  1.22    0.002007          21        96           fchdir
  0.90    0.001483          29        52           stat
  0.89    0.001467          31        47           newfstatat
  0.64    0.001050          42        25           mmap
  0.41    0.000667          48        14           mprotect
  0.26    0.000434          36        12           fstat
  0.26    0.000428          43        10           read
  0.13    0.000213         107         2           statfs
  0.12    0.000193          97         2         1 access
  0.09    0.000142          47         3           munmap
  0.07    0.000120         120         1           rt_sigprocmask
  0.05    0.000088          44         2           rt_sigaction
  0.05    0.000087          44         2           ioctl
  0.05    0.000085          28         3           brk
  0.04    0.000066          66         1           lseek
  0.04    0.000063          32         2           fcntl
  0.03    0.000052          52         1           set_tid_address
  0.02    0.000041          41         1           getrlimit
  0.02    0.000037          37         1           set_robust_list
  0.02    0.000032          32         1           futex
  0.02    0.000031          31         1           execve
  0.02    0.000029          29         1           arch_prctl
  0.01    0.000024          24         1           uname
------ ----------- ----------- --------- --------- ----------------
100.00    0.164524                   865         5 total

@tavianator
Copy link
Collaborator

Your find equivalent is buggy, it should be:

find /mnt/ftp/home/redacted ! -newermt "1 day ago" -type f \( -name '.in.*' -or -name '*.part' \)

It may be worth putting -newermt last so the other predicates can run first (since they don't require stat()), but I suspect the optimizer has already done that for you.

The other thing that might explain the difference between find and oldfind is the leaf optimization. What version of find are you using? I believe it was missing from ftsfind for a while.

As far as fd, it does seem to be doing more stat() calls than it needs to. Perhaps #402 would help. Additionally, actually matching the path against the pattern is the last thing that happens, but doing it earlier would avoid needing metadata in many cases.

@hungptit
Copy link

@sbocq fd may not be faster than GNU find for small and midsize folders (~50K) or with -j1 option. fd will shine if you have a large folder hierarchy with >100K files and folders. See this benchmark results for more information https://github.com/hungptit/ioutils/blob/master/benchmark.md.

@sbocq
Copy link
Author

sbocq commented Apr 25, 2019

Your find equivalent is buggy, it should be:

Indeed, thanks for the correction.

It may be worth putting -newermt last so the other predicates can run first (since they don't require stat()), but I suspect the optimizer has already done that for you.

Yes, changing the order doesn't make a difference.

The other thing that might explain the difference between find and oldfind is the leaf optimization. What version of find are you using? I believe it was missing from ftsfind for a while.

$ find -version
find (GNU findutils) 4.5.11
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Eric B. Decker, James Youngman, and Kevin Dalley.
Features enabled: D_TYPE O_NOFOLLOW(enabled) LEAF_OPTIMISATION SELINUX FTS(FTS_CWDFD) CBO(level=2) 

As far as fd, it does seem to be doing more stat() calls than it needs to. Perhaps #402 would help. Additionally, actually matching the path against the pattern is the last thing that happens, but doing it earlier would avoid needing metadata in many cases.

Exactly!

@sbocq
Copy link
Author

sbocq commented Apr 25, 2019

@hungptit Maybe it can improve it in other benchmarks as well. For example, here is another one in the same environment where oldfind runs faster.

$ time strace -c -f oldfind /mnt/ftp/home/redacted -type f|wc -l
116960
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 68.93    0.549301          50     11070           newfstatat
 22.46    0.178975         653       274           getdents
  6.30    0.050178          22      2299           write
  0.70    0.005568          52       107         4 open
  0.63    0.005012         109        46           openat
  0.40    0.003161          21       152           close
  0.25    0.002031          21        96           fchdir
  0.13    0.001068          23        47           stat
  0.04    0.000327          14        24           mmap
  0.04    0.000311         156         2           statfs
  0.04    0.000290          21        14           mprotect
  0.02    0.000146          15        10           fstat
  0.01    0.000117          39         3           munmap
  0.01    0.000095          12         8           read
  0.01    0.000046          23         2         1 ioctl
  0.01    0.000043          14         3           brk
  0.00    0.000038          19         2           fcntl
  0.00    0.000037          19         2           rt_sigaction
  0.00    0.000033          33         1           getrlimit
  0.00    0.000032          16         2         1 access
  0.00    0.000020          20         1           set_tid_address
  0.00    0.000019          19         1           uname
  0.00    0.000019          19         1           set_robust_list
  0.00    0.000018          18         1           rt_sigprocmask
  0.00    0.000018          18         1           arch_prctl
  0.00    0.000000           0         1           execve
------ ----------- ----------- --------- --------- ----------------
100.00    0.796903                 14170         6 total

real	0m3.845s
user	0m0.462s
sys	0m2.226s

$ time strace -c -f ./fd '' /mnt/ftp/home/redacted --type f -j 1|wc -l
strace: Process 65002 attached
strace: Process 65003 attached
116960
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 69.01    9.962759     2490690         4           futex
 26.17    3.777428          32    116960           write
  3.01    0.434237          38     11527           lstat
  1.01    0.145152          44      3297           getdents64
  0.59    0.085828          21      4056           brk
  0.13    0.018973          76       250       202 open
  0.03    0.003653         122        30           madvise
  0.02    0.002938        1469         2           readlink
  0.02    0.002185          36        60        55 stat
  0.01    0.001315          27        48           close
  0.01    0.000930          20        46           fcntl
  0.00    0.000234          39         6           mmap
  0.00    0.000217          24         9           sigaltstack
  0.00    0.000163          33         5           munmap
  0.00    0.000092          46         2           clone
  0.00    0.000088          22         4           rt_sigprocmask
  0.00    0.000062          21         3           rt_sigaction
  0.00    0.000049          25         2           fstat
  0.00    0.000045          23         2           mprotect
  0.00    0.000042          21         2           getrandom
  0.00    0.000017          17         1         1 ioctl
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           set_tid_address
------ ----------- ----------- --------- --------- ----------------
100.00   14.436407                136319       258 total

real	0m19.465s
user	0m5.021s
sys	0m21.787s

@sharkdp
Copy link
Owner

sharkdp commented Apr 25, 2019

Thank you very much for the detailed report, analysis and all the answers. I didn't have time to look into this so far, but planning to do so in the next days.

tavianator added a commit to tavianator/fd that referenced this issue Apr 26, 2019
…data

This should partially address sharkdp#432 by decreasing the number of stat() calls:

    $ strace -c -f ./fd-before '.h$' -j1 /usr -S +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    100.00    3.700169           3    983938     38022 stat
    $ strace -c -f ./fd-after '.h$' -j1 /usr -S +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.671723           4    162052     38021 stat

Though it's not as good as possible:

    $ strace -c -f find /usr -name '*.h' -size +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    18.75    0.449866           3    136199           newfstatat
    $ strace -c -f bfs /usr -name '*.h' -size +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    15.01    0.216024           3     60211         1 statx

Performance is much better when metadata is required:

    $ hyperfine ./fd-{before,after}" '.h$' -j1 /usr -S +1k"
    Benchmark sharkdp#1: ./fd-before '.h$' -j1 /usr -S +1k
      Time (mean ± σ):      2.707 s ±  0.042 s    [User: 890.8 ms, System: 1939.7 ms]
      Range (min … max):    2.659 s …  2.786 s    10 runs

    Benchmark sharkdp#2: ./fd-after '.h$' -j1 /usr -S +1k
      Time (mean ± σ):      1.562 s ±  0.034 s    [User: 726.2 ms, System: 957.9 ms]
      Range (min … max):    1.536 s …  1.648 s    10 runs

    Summary
      './fd-after '.h$' -j1 /usr -S +1k' ran
        1.73 ± 0.05 times faster than './fd-before '.h$' -j1 /usr -S +1k'

While remaining the same when it's not:

    tavianator@graviton $ hyperfine ./fd-{before,after}" '.h$' -j1 /usr"
    Benchmark sharkdp#1: ./fd-before '.h$' -j1 /usr
      Time (mean ± σ):      1.341 s ±  0.016 s    [User: 664.3 ms, System: 761.2 ms]
      Range (min … max):    1.309 s …  1.361 s    10 runs

    Benchmark sharkdp#2: ./fd-after '.h$' -j1 /usr
      Time (mean ± σ):      1.338 s ±  0.012 s    [User: 684.1 ms, System: 741.1 ms]
      Range (min … max):    1.310 s …  1.350 s    10 runs

    Summary
      './fd-after '.h$' -j1 /usr' ran
        1.00 ± 0.02 times faster than './fd-before '.h$' -j1 /usr'
tavianator added a commit to tavianator/fd that referenced this issue Apr 29, 2019
…data

This should partially address sharkdp#432 by decreasing the number of stat() calls:

    $ strace -c -f ./fd-before '\.h$' /usr -j1 -S +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     15.71    8.831948           7   1192279     46059 stat
    $ strace -c -f ./fd-after '\.h$' /usr -j1 -S +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
      7.92    1.972474          10    183907     46046 stat

Though it's not as few as possible:

    $ strace -c -f find /usr -iname '*.h' -size +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     19.01    0.946500           5    161649           newfstatat
    $ strace -c -f bfs /usr -iname '*.h' -size +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     13.73    0.406565           5     69005           statx

Performance is much better when metadata is required:

    $ hyperfine ./fd-{before,after}" '\.h$' /usr -j1 -S +1k"
    Benchmark sharkdp#1: ./fd-before '\.h$' /usr -j1 -S +1k
      Time (mean ± σ):      4.623 s ±  0.154 s    [User: 1.465 s, System: 3.354 s]
      Range (min … max):    4.327 s …  4.815 s    10 runs

    Benchmark sharkdp#2: ./fd-after '\.h$' /usr -j1 -S +1k
      Time (mean ± σ):      2.650 s ±  0.058 s    [User: 1.258 s, System: 1.592 s]
      Range (min … max):    2.568 s …  2.723 s    10 runs

    Summary
      './fd-after '\.h$' /usr -j1 -S +1k' ran
        1.74 ± 0.07 times faster than './fd-before '\.h$' /usr -j1 -S +1k'

While remaining the same when it's not:

    $ hyperfine ./fd-{before,after}" '.h$' /usr -j1"
    Benchmark sharkdp#1: ./fd-before '.h$' /usr -j1
      Time (mean ± σ):      2.314 s ±  0.052 s    [User: 1.185 s, System: 1.291 s]
      Range (min … max):    2.260 s …  2.441 s    10 runs

    Benchmark sharkdp#2: ./fd-after '.h$' /usr -j1
      Time (mean ± σ):      2.316 s ±  0.040 s    [User: 1.162 s, System: 1.315 s]
      Range (min … max):    2.263 s …  2.375 s    10 runs

    Summary
      './fd-before '.h$' /usr -j1' ran
        1.00 ± 0.03 times faster than './fd-after '.h$' /usr -j1'
tavianator added a commit to tavianator/fd that referenced this issue Apr 30, 2019
…data

This should partially address sharkdp#432 by decreasing the number of stat() calls:

    $ strace -c -f ./fd-before '\.h$' /usr -j1 -S +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     15.71    8.831948           7   1192279     46059 stat
    $ strace -c -f ./fd-after '\.h$' /usr -j1 -S +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
      7.92    1.972474          10    183907     46046 stat

Though it's not as few as possible:

    $ strace -c -f find /usr -iname '*.h' -size +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     19.01    0.946500           5    161649           newfstatat
    $ strace -c -f bfs /usr -iname '*.h' -size +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     13.73    0.406565           5     69005           statx

Performance is much better when metadata is required:

    $ hyperfine ./fd-{before,after}" '\.h$' /usr -j1 -S +1k"
    Benchmark sharkdp#1: ./fd-before '\.h$' /usr -j1 -S +1k
      Time (mean ± σ):      4.623 s ±  0.154 s    [User: 1.465 s, System: 3.354 s]
      Range (min … max):    4.327 s …  4.815 s    10 runs

    Benchmark sharkdp#2: ./fd-after '\.h$' /usr -j1 -S +1k
      Time (mean ± σ):      2.650 s ±  0.058 s    [User: 1.258 s, System: 1.592 s]
      Range (min … max):    2.568 s …  2.723 s    10 runs

    Summary
      './fd-after '\.h$' /usr -j1 -S +1k' ran
        1.74 ± 0.07 times faster than './fd-before '\.h$' /usr -j1 -S +1k'

While remaining the same when it's not:

    $ hyperfine ./fd-{before,after}" '.h$' /usr -j1"
    Benchmark sharkdp#1: ./fd-before '.h$' /usr -j1
      Time (mean ± σ):      2.314 s ±  0.052 s    [User: 1.185 s, System: 1.291 s]
      Range (min … max):    2.260 s …  2.441 s    10 runs

    Benchmark sharkdp#2: ./fd-after '.h$' /usr -j1
      Time (mean ± σ):      2.316 s ±  0.040 s    [User: 1.162 s, System: 1.315 s]
      Range (min … max):    2.263 s …  2.375 s    10 runs

    Summary
      './fd-before '.h$' /usr -j1' ran
        1.00 ± 0.03 times faster than './fd-after '.h$' /usr -j1'
tavianator added a commit to tavianator/fd that referenced this issue May 3, 2019
…data

This should partially address sharkdp#432 by decreasing the number of stat() calls:

    $ strace -c -f ./fd-before '\.h$' /usr -j1 -S +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     15.71    8.831948           7   1192279     46059 stat
    $ strace -c -f ./fd-after '\.h$' /usr -j1 -S +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
      7.92    1.972474          10    183907     46046 stat

Though it's not as few as possible:

    $ strace -c -f find /usr -iname '*.h' -size +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     19.01    0.946500           5    161649           newfstatat
    $ strace -c -f bfs /usr -iname '*.h' -size +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     13.73    0.406565           5     69005           statx

Performance is much better when metadata is required:

    $ hyperfine ./fd-{before,after}" '\.h$' /usr -j1 -S +1k"
    Benchmark sharkdp#1: ./fd-before '\.h$' /usr -j1 -S +1k
      Time (mean ± σ):      4.623 s ±  0.154 s    [User: 1.465 s, System: 3.354 s]
      Range (min … max):    4.327 s …  4.815 s    10 runs

    Benchmark sharkdp#2: ./fd-after '\.h$' /usr -j1 -S +1k
      Time (mean ± σ):      2.650 s ±  0.058 s    [User: 1.258 s, System: 1.592 s]
      Range (min … max):    2.568 s …  2.723 s    10 runs

    Summary
      './fd-after '\.h$' /usr -j1 -S +1k' ran
        1.74 ± 0.07 times faster than './fd-before '\.h$' /usr -j1 -S +1k'

While remaining the same when it's not:

    $ hyperfine ./fd-{before,after}" '\.h$' /usr -j1"
    Benchmark sharkdp#1: ./fd-before '\.h$' /usr -j1
      Time (mean ± σ):      2.382 s ±  0.038 s    [User: 1.221 s, System: 1.286 s]
      Range (min … max):    2.325 s …  2.433 s    10 runs

    Benchmark sharkdp#2: ./fd-after '\.h$' /usr -j1
      Time (mean ± σ):      2.362 s ±  0.034 s    [User: 1.193 s, System: 1.294 s]
      Range (min … max):    2.307 s …  2.422 s    10 runs

    Summary
      './fd-after '\.h$' /usr -j1' ran
        1.01 ± 0.02 times faster than './fd-before '\.h$' /usr -j1'
sharkdp pushed a commit that referenced this issue May 8, 2019
…data

This should partially address #432 by decreasing the number of stat() calls:

    $ strace -c -f ./fd-before '\.h$' /usr -j1 -S +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     15.71    8.831948           7   1192279     46059 stat
    $ strace -c -f ./fd-after '\.h$' /usr -j1 -S +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
      7.92    1.972474          10    183907     46046 stat

Though it's not as few as possible:

    $ strace -c -f find /usr -iname '*.h' -size +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     19.01    0.946500           5    161649           newfstatat
    $ strace -c -f bfs /usr -iname '*.h' -size +1k >/dev/null
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     13.73    0.406565           5     69005           statx

Performance is much better when metadata is required:

    $ hyperfine ./fd-{before,after}" '\.h$' /usr -j1 -S +1k"
    Benchmark #1: ./fd-before '\.h$' /usr -j1 -S +1k
      Time (mean ± σ):      4.623 s ±  0.154 s    [User: 1.465 s, System: 3.354 s]
      Range (min … max):    4.327 s …  4.815 s    10 runs

    Benchmark #2: ./fd-after '\.h$' /usr -j1 -S +1k
      Time (mean ± σ):      2.650 s ±  0.058 s    [User: 1.258 s, System: 1.592 s]
      Range (min … max):    2.568 s …  2.723 s    10 runs

    Summary
      './fd-after '\.h$' /usr -j1 -S +1k' ran
        1.74 ± 0.07 times faster than './fd-before '\.h$' /usr -j1 -S +1k'

While remaining the same when it's not:

    $ hyperfine ./fd-{before,after}" '\.h$' /usr -j1"
    Benchmark #1: ./fd-before '\.h$' /usr -j1
      Time (mean ± σ):      2.382 s ±  0.038 s    [User: 1.221 s, System: 1.286 s]
      Range (min … max):    2.325 s …  2.433 s    10 runs

    Benchmark #2: ./fd-after '\.h$' /usr -j1
      Time (mean ± σ):      2.362 s ±  0.034 s    [User: 1.193 s, System: 1.294 s]
      Range (min … max):    2.307 s …  2.422 s    10 runs

    Summary
      './fd-after '\.h$' /usr -j1' ran
        1.01 ± 0.02 times faster than './fd-before '\.h$' /usr -j1'
@sharkdp
Copy link
Owner

sharkdp commented Sep 15, 2019

There have been significant performance improvements in v7.4.0, notably the ones by @tavianator which should directly address this issue.

I'm going to close this ticket for now, but it would be great if we could get some updated benchmark results on this. I'm happy to re-open it if anyone thinks, that this is not yet resolved.

@sharkdp sharkdp closed this as completed Sep 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants