Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

walk: fd -L should include broken symlinks #497

Merged
merged 10 commits into from
Feb 28, 2020

Conversation

tommilligan
Copy link
Contributor

There is a dead PR #366 already submitted for issue #357. I thought I would propose a smaller change that's more likely to be accepted into the codebase.

Happy to take feedback and change the design if needed in the interest of getting this closed!

Closes #357

@tommilligan
Copy link
Contributor Author

Failing due to use of unstable library feature 'try_from' - if this is an issue I can rewrite to be a function not a trait, but the idea will be the same.

@sharkdp
Copy link
Owner

sharkdp commented Oct 9, 2019

From a first glance, the code looks great. Thank you for your contribution!

I would definitely like to see some benchmark results before merging this. I can do this on my own, but that might take some time.

The https://github.com/sharkdp/fd-benchmarks repository has a regression.sh script that can be used to perform comparison benchmarks. The script might need a review first before running.

@sharkdp
Copy link
Owner

sharkdp commented Oct 9, 2019

Failing due to use of unstable library feature 'try_from' - if this is an issue I can rewrite to be a function not a trait, but the idea will be the same.

I'm okay with bumping the minimum required Rust version. It needs to be changed in build.rs, .travis.yml and the README.md.

@tommilligan
Copy link
Contributor Author

@sharkdp ran ./regressions.sh as you suggested. Full results are here.

The high level summary is:

'./fd-master  1.11 ± 0.15 times faster than './fd-feature
'./fd-feature 1.01 ± 0.15 times faster than './fd-master
'./fd-feature 1.00 ± 0.04 times faster than './fd-master
'./fd-master  1.10 ± 0.16 times faster than './fd-feature
'./fd-master  1.04 ± 0.08 times faster than './fd-feature
'./fd-feature 1.00 ± 0.01 times faster than './fd-master

Which I would interpret as master being very slightly faster than this branch (which is what I'd expect), but not significantly so. I don't know what your threshold is for introducing features vs. performance hits.

@sharkdp
Copy link
Owner

sharkdp commented Oct 10, 2019

Thank you very much. The standard deviations are pretty high for some of your benchmark runs, around 15%. Are you sure that you were running the benchmarks on a "quiet" PC? No Dropbox, Spotify, etc. running in the background? To me, the deviations look like statistical noise.

I also ran the benchmark suite on my PC and get the following result:

No pattern

Command Mean [s] Min [s] Max [s] Relative
./fd-master --hidden --no-ignore '' '/home/shark' 1.127 ± 0.028 1.092 1.169 1.01
./fd-feature --hidden --no-ignore '' '/home/shark' 1.120 ± 0.029 1.070 1.158 1.00

Simple pattern

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master '.*[0-9]\.jpg$' '/home/shark' 199.0 ± 2.9 195.9 208.1 1.00
./fd-feature '.*[0-9]\.jpg$' '/home/shark' 199.5 ± 2.1 196.3 205.1 1.00

Simple pattern (-HI)

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI '.*[0-9]\.jpg$' '/home/shark' 527.7 ± 3.5 523.0 533.3 1.00
./fd-feature -HI '.*[0-9]\.jpg$' '/home/shark' 528.3 ± 3.5 524.1 534.1 1.00

File extension

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI --extension jpg '' '/home/shark' 545.1 ± 6.0 538.6 555.5 1.00
./fd-feature -HI --extension jpg '' '/home/shark' 546.9 ± 6.1 542.5 558.8 1.00

File type

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI --type l '' '/home/shark' 524.0 ± 2.4 520.3 529.4 1.00
./fd-feature -HI --type l '' '/home/shark' 524.4 ± 2.0 522.5 527.4 1.00

Cold cache

Command Mean [s] Min [s] Max [s] Relative
./fd-master -HI '.*[0-9]\.jpg$' '/home/shark' 5.597 ± 0.014 5.584 5.611 1.00
./fd-feature -HI '.*[0-9]\.jpg$' '/home/shark' 5.607 ± 0.011 5.600 5.620 1.00

So it looks like there is no significant difference between the two versions, which is great!

In fact, we recently added a script to the hyperfine repository that performs a statistical t-test to check whether or not two benchmark results follow the same distribution. Here are the results:

Command 1: ./fd-master --hidden --no-ignore '' '/home/shark'
Command 2: ./fd-feature --hidden --no-ignore '' '/home/shark'

t = 0.57, p = 0.576

The two benchmarks are almost the same (p >= 0.05).

Command 1: ./fd-master '.*[0-9]\.jpg$' '/home/shark'
Command 2: ./fd-feature '.*[0-9]\.jpg$' '/home/shark'

t = -0.472, p = 0.641

The two benchmarks are almost the same (p >= 0.05).

Command 1: ./fd-master -HI '.*[0-9]\.jpg$' '/home/shark'
Command 2: ./fd-feature -HI '.*[0-9]\.jpg$' '/home/shark'

t = -0.372, p = 0.715

The two benchmarks are almost the same (p >= 0.05).

Command 1: ./fd-master -HI --extension jpg '' '/home/shark'
Command 2: ./fd-feature -HI --extension jpg '' '/home/shark'

t = -0.66, p = 0.518

The two benchmarks are almost the same (p >= 0.05).

Command 1: ./fd-master -HI --type l '' '/home/shark'
Command 2: ./fd-feature -HI --type l '' '/home/shark'

t = -0.396, p = 0.697

The two benchmarks are almost the same (p >= 0.05).

Command 1: ./fd-master -HI '.*[0-9]\.jpg$' '/home/shark'
Command 2: ./fd-feature -HI '.*[0-9]\.jpg$' '/home/shark'

t = -0.969, p = 0.389

The two benchmarks are almost the same (p >= 0.05).

Copy link
Owner

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few in-line comments

tests/testenv/mod.rs Show resolved Hide resolved
tests/tests.rs Outdated Show resolved Hide resolved
src/walk.rs Outdated Show resolved Hide resolved
@sharkdp
Copy link
Owner

sharkdp commented Dec 19, 2019

@tommilligan There is no rush, but I'd like to know if you plan to finish this at some point. Otherwise, we could try to find someone else.

@sharkdp
Copy link
Owner

sharkdp commented Feb 28, 2020

This implementation does not really work. See BurntSushi/ripgrep#1503 for an upstream bug report.

@sharkdp
Copy link
Owner

sharkdp commented Feb 28, 2020

I wrote a completely new implementation, which fixes the outstanding unit test.

fd regression benchmark

No pattern

Command Mean [s] Min [s] Max [s] Relative
./fd-master --hidden --no-ignore '' '/home/shark' 1.059 ± 0.046 1.007 1.120 1.03 ± 0.05
./fd-feature --hidden --no-ignore '' '/home/shark' 1.023 ± 0.021 1.006 1.066 1.00

Simple pattern

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master '.*[0-9]\.jpg$' '/home/shark' 176.2 ± 4.8 169.8 185.7 1.02 ± 0.04
./fd-feature '.*[0-9]\.jpg$' '/home/shark' 173.0 ± 4.9 167.3 187.4 1.00

Simple pattern (-HI)

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI '.*[0-9]\.jpg$' '/home/shark' 501.7 ± 7.7 493.5 514.8 1.01 ± 0.02
./fd-feature -HI '.*[0-9]\.jpg$' '/home/shark' 497.1 ± 5.5 489.1 506.9 1.00

File extension

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI --extension jpg '' '/home/shark' 510.8 ± 6.2 506.3 526.8 1.00
./fd-feature -HI --extension jpg '' '/home/shark' 511.1 ± 2.6 507.4 514.8 1.00 ± 0.01

File type

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI --type l '' '/home/shark' 495.4 ± 12.5 484.8 525.4 1.00
./fd-feature -HI --type l '' '/home/shark' 496.5 ± 4.4 490.0 502.5 1.00 ± 0.03

Cold cache

Command Mean [s] Min [s] Max [s] Relative
./fd-master -HI '.*[0-9]\.jpg$' '/home/shark' 5.325 ± 0.025 5.297 5.340 1.01 ± 0.01
./fd-feature -HI '.*[0-9]\.jpg$' '/home/shark' 5.283 ± 0.027 5.252 5.299 1.00

@sharkdp
Copy link
Owner

sharkdp commented Feb 28, 2020

This can actually be used to search for broken symlinks:

fd -L -tl

@sharkdp sharkdp merged commit d05e717 into sharkdp:master Feb 28, 2020
This was referenced Feb 28, 2020
@lestephane
Copy link

fd -L -tl

This command returns working and non-working symlinks, and the only way to differentiate them is the color, which is hard to process using scripting. Is there a better way to find only broken links?

@tavianator
Copy link
Collaborator

@lestephane Not for me:

tavianator@graphene $ touch file
tavianator@graphene $ ln -s file link
tavianator@graphene $ ln -s nowhere broken
tavianator@graphene $ fd -L -tl
broken
tavianator@graphene $ fd -tl
broken
link

@sharkdp
Copy link
Owner

sharkdp commented Apr 3, 2023

See also: #1298

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fd -L omits broken symlinks
4 participants