Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readDir performance improvements (1/2) #809

Merged
merged 12 commits into from
Aug 5, 2019

Conversation

nellh
Copy link
Member

@nellh nellh commented Aug 2, 2019

This reworks utils.files.readDir to avoid the extra syscalls found in #807 and improve performance greatly for network filesystems.

The main improvement is using readdir(dir, { withFileTypes: true }) which allows us to skip the extra stat call for each file in the dataset since the fs.Dirent objects returned include the file type. On Linux + NFS this is a pretty big improvement since it results in one getdents64 call for each directory instead of a readdir call for each directory and a stat call for every file and directory.

The second improvement is avoiding transversal of ignored files entirely. If a directory is ignored, we used to scan the entire thing anyways and just prune it at the end. This prevents a lot of extra scans of the .git tree for DataLad datasets.

Some benchmarks - test suite on my workstation (ext4 + SSD):

master branch
real 0m31.568s 0m11.281s
user 0m49.635s 0m35.629s
sys 0m4.685s 0m3.266s

Validating a real dataset with 105k files on AWS (NFSv4 client + EFS):

master branch
real 3m27.462s 0m43.374s
user 0m9.932s 0m4.796s
sys 0m5.328s 0m1.252s

Copy link
Member

@rwblair rwblair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, tested locally with both PRs merged and am seeing the speed up.

Our docs state node 8 as a minimum so in order for fs.promises to work this will need to be increased to 10. I don't know how big of a shock this will be to users. Ubuntu 18.04 still defaults to node 8

Throwing a loud error about needing node 10 might be enough.

@nellh
Copy link
Member Author

nellh commented Aug 2, 2019

Looks great, tested locally and am seeing the speed up.

Our docs state node 8 as a minimum so in order for fs.promises to work this will need to be increased to 10. I don't know how big of a shock this will be to users. Ubuntu 18.04 still defaults to node 8

Throwing a loud error about needing node 10 might be enough.

It also requires at least v10.10 to expose the withFileTypes optimization. I've updated the engine restriction in package.json so that npm/yarn will throw an error if you try to update to a version of the validator without being on a new enough node release.

@rwblair
Copy link
Member

rwblair commented Aug 5, 2019

Looks like circle might be using cache with old nodejs version in it. I don't have permissions to trigger a build sans cache.

@nellh nellh force-pushed the fs-performance-improvements-1 branch from dd4f1a4 to 99a3307 Compare August 5, 2019 17:59
@nellh nellh merged commit fc10516 into bids-standard:master Aug 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants