Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs.statSync is much faster than fs.promises.stat #38006

Open
fabiospampinato opened this issue Mar 31, 2021 · 5 comments
Open

fs.statSync is much faster than fs.promises.stat #38006

fabiospampinato opened this issue Mar 31, 2021 · 5 comments
Labels
fs Issues and PRs related to the fs subsystem / file system. performance Issues and PRs related to the performance of Node.js.

Comments

@fabiospampinato
Copy link

fabiospampinato commented Mar 31, 2021

  • Version: v14.11.0
  • Platform: Darwin fabiomac.local 18.7.0 Darwin Kernel Version 18.7.0: Tue Aug 20 16:57:14 PDT 2019; root:xnu-4903.271.2~2/RELEASE_X86_64 x86_64
  • Subsystem: fs (?)

What steps will reproduce the bug?

I don't have a complete minimal repro to share, but essentially I'm trying to fetch stats data about files as fast as possible, and I discovered that this synchronous way of fetching stats data:

function faststat (path) {
  return binding.stat(toNamespacedPath(path), true, undefined, {path});
}

Is consistently about 2x faster on my system than the ~equivalent asynchronous version:

function faststat (path) {
  return binding.stat(toNamespacedPath(path), true, kUsePromises);
}

The synchronous version takes almost always 40ms ±2ms on my system for 10000 files, while the asynchronous version takes almost always 95ms ± 5ms for the same 10000 files.

The synchronous version is benchmarked like this:

  const list = new Array ( 10000 ).fill ( null );
  console.time('bench');
  list.map ( ( _, index ) => {
    const filePath = `/Users/fabio/Desktop/test_50k/notes/temp_${index}.md`;
    return faststat( filePath );
  });
  console.timeEnd('bench');

And the asynchronous version is benchmarked like this:

  const list = new Array ( 10000 ).fill ( null );
  console.time('bench');
  await Promise.all ( list.map ( ( _, index ) => {
    const filePath = `/Users/fabio/Desktop/test_50k/notes/temp_${index}.md`;
    return faststat( filePath );
  }));
  console.timeEnd('bench');

The undeclared variables in the snippets come from this:

const binding = process.binding('fs');
const {stat, open, close, fstat} = binding;
const {FSReqCallback} = binding;
const {toNamespacedPath} = require('path');
const {kUsePromises} = binding;

Also benchmarking the promise-part of this basically it just takes ~7ms on my system to run the following:

(async () => {
  const list = new Array ( 10000 ).fill ( null );
  console.time('bench');
  await Promise.all ( list.map ( () => {
    new Promise ( resolve => resolve ([]) );
  }));
  console.timeEnd('bench');
})()

So basically the issue is: the async low-level version to fetch stats data is more than 2x slower than the sync version, and that kind of code is used in the higher-level functions like fs.promises.stat and fs.statSync.

How is this possible when just the overhead for promises seems to account for ~7ms of the ~50ms of overhead and the sync version blocks the thread at each call which the async version doesn't do? Is there some sloppy code handling the async version of that on the native side or am I missing something?

How often does it reproduce? Is there a required condition?

Always.

What is the expected behavior?

The async version shouldn't be 2x slower than the sync version.

What do you see instead?

The opposite.

@Ayase-252 Ayase-252 added fs Issues and PRs related to the fs subsystem / file system. performance Issues and PRs related to the performance of Node.js. labels Mar 31, 2021
@mscdex
Copy link
Contributor

mscdex commented Mar 31, 2021

One thing to remember is that async filesystem calls (promise or callback-based) go through a thread pool (which is of a fixed size) whereas the sync filesystem calls do not, so that will have some impact.

@Linkgoron
Copy link
Member

Linkgoron commented Mar 31, 2021

I'm not sure Promises are the issue. Comparing promises to callbacks, showed that callbacks were faster but not in a significant way. tl;dr stat and promises.stat are much slower than the sync version, although they have relatively similar performance to eachother. However running them concurrently is much faster.

Here is the benchmark that I used:

const fs = require('fs');
const filePath = `${__dirname}\\tst\\doc`;

function sync() {
  console.time('bench');
  for (let index = 0; index < 10000; index++) {    
    fs.statSync(filePath+`_${index}.md`);
  };
  console.timeEnd('bench');
}

function callbackInternal(index, max, resolve) {
  return fs.stat(filePath+`_${index}.md`, () => {
    if (index < max) {
      return callbackInternal(index + 1, max, resolve);
    }
    resolve();
  });
}

function callback() {
  console.time('bench-2');
  return new Promise((res) => {
    callbackInternal(0, 10000, res);
  }).then(() => {
    console.timeEnd('bench-2');
  })
};

async function promise() {
  console.time('bench-3');

  for (let index = 0; index < 10000; index++) {
    await fs.promises.stat(filePath+`_${index}.md`);
  }
  console.timeEnd('bench-3');
};

async function promiseAll() {
  const list = new Array ( 10000 ).fill ( null );
  console.time('bench-4');
  await Promise.all ( list.map ( ( _, index ) => {
    return fs.promises.stat( filePath+`_${index}.md` );
  }));
  console.timeEnd('bench-4');
};

sync();
callback().then(() => {
  return promise();
}).then(()=>{
  promiseAll();
});

resulted in:

bench: 372.542ms
bench-2: 747.403ms
bench-3: 797.711ms
bench-4: 112.844ms

Doing concurrent fs.promises.stat was much faster than 10k statSync, while stat and promises.stat were more or less the same, I created 10k files which were copies of fs.md from Node source.

I'm not sure if this is unexpected, usually sync versions are faster than the non-sync version, but as previously stated they have different compromises.

@mscdex
Copy link
Contributor

mscdex commented Mar 31, 2021

Another thing to remember when benchmarking APIs that hit the filesystem is that there could be caching going on (at the OS level), which could skew results as well. So proper benchmarking would involve clearing any possible fs-related caches between runs.

@fabiospampinato
Copy link
Author

One thing to remember is that async filesystem calls (promise or callback-based) go through a thread pool (which is of a fixed size) whereas the sync filesystem calls do not, so that will have some impact.

Sure, do you think though that that's the cause of the 2x slowdown I'm seeing for the async version? Like is really the overhead for those async calls higher on my system than performing 10000 blocking fs.statSync calls?

Doing concurrent fs.promises.stat was much faster than 10k statSync, while stat and promises.stat were more or less the same, I created 10k files which were copies of fs.md from Node source.

I'm not sure if this is unexpected, usually sync versions are faster than the non-sync version, but as previously stated they have different compromises.

Thanks for making a usable repro first of all.

I would agree with what you are saying, but I think I'm seeing qualitatively different numbers on my system:

bench: 61.235ms
bench-2: 191.164ms
bench-3: 221.569ms
bench-4: 144.721ms

The sync version is consistently about 2x faster than the concurrent version for me, while you are kind of seeing the opposite.

I suppose some of it can be explained by assuming that stat syscalls are performed faster on my system, but still this illustrate the problem I was referring to in my initial post here, is it reasonable to see the concurrent version perform consistently 2x slower than the sync alternative? Like if it takes ~60ms to prod the filesystem (and even if some cache has an impact here both the sync and async calls would hit the cache in the end basically, I would think) and perform some checks in a blocking way is it reasonable to expect Node to be 2x slower on the concurrent version of the same thing?

@avivkeller
Copy link
Member

It's been a little bit since any activity on this issue, but here are my benchmarking results:

import fs from 'fs';
import { fileURLToPath } from 'url';

// Convert import.meta.url to a file path
const filePath = fileURLToPath(import.meta.url);

const list = new Array(10000).fill(filePath);

// Benchmark fs.statSync
console.time('statSync');
list.forEach((path) => {
    fs.statSync(path);
});
console.timeEnd('statSync');

// Benchmark fs.promises.stat
console.time('promises.stat');
await Promise.all(list.map(async (path) => {
    await fs.promises.stat(path);
}));
console.timeEnd('promises.stat');
$ node bench.mjs
statSync: 21.694ms
promises.stat: 93.769ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. performance Issues and PRs related to the performance of Node.js.
Projects
None yet
Development

No branches or pull requests

5 participants