-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs: readFile
in one syscall to avoid context switching
#41436
base: main
Are you sure you want to change the base?
Conversation
This PR makes `readFile` as fast as `readFileSync` In fact, there is no point in reading the file in small chunks as the buffer is already allocated The `AbortController` does not justify such a huge performance hit
594fb7b
to
ed9590f
Compare
Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1085/
|
My guess is that it was done that way to be more fair to other operations in the thread pool. EDIT: I think that's exactly what benchmark/fs/readfile-partitioned.js is testing and why it is seeing such a large regression. |
There is no reason for this test to be slower - it just reduces the number of context switches meaning threads tend to stay longer on the CPU - if the CPU has enough cores to accommodate all libuv threads. |
One of the most viewed Node.js questions without an answer https://stackoverflow.com/questions/52648229/i-o-performance-in-node-js-worker-threads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both are completely independent
They both use the thread pool, so the tasks are competing for available threads.
As I said, that particular benchmark configuration is showing that large(r) reads can block other tasks in the thread pool, which is why the regression shows up because fewer total tasks are being completed.
I think the behavior being changed in this PR should be opt-in instead of being forced.
Additionally, regarding your answer on that StackOverflow question: there is no dynamic thread creation happening. It's a thread pool, so threads are created once at startup and then are reused during the life of the process.
@mscdex I was referring to the question I was answered - in which |
@addaleax @mscdex Isn't this the perfect moment to discuss increasing the default thread size from 4? It is a value inherited from another age when high-end CPUs rarely had more than 4 cores and 2MB was a lot of memory. |
readFile
in one syscall to avoid context switchingreadFile
in one syscall to avoid context switching
How about (it goes far beyond the scope of this PR, it just an idea) having half of the threads marked for CPU work and half of the threads marked for IO work, and when scheduling async work one is to specify the type of load - if there are enough threads to cover all cores for every type of work this arrangement would guarantee maximum performance in all cases. It is a huge change, I know, but it would bring an improvement all across the board. |
I think 4 is still a reasonable default. Besides, users can already change the threadpool size by setting the
I don't think that's going to be doable since addons can use the same threadpool and you would need to rely on them to do the right thing. Also, what if you have a task that uses both CPU and I/O (e.g. possibly via a third-party library function), which bucket would you put it in? |
You can do it like this: you still have a single pool, some tasks carry a hint and the only rule is that you don't go over half of the pool for tasks that carry the same hint |
Besides, I learned about |
It's already documented in the man page, the If you think the current documentation is insufficient, submit a PR to improve it.
Also, when the documentation is updated to include the new config option, it should include a warning about the effect it can have on the thread pool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading any file without a length limit is very risky so I don't think it should be the default behavior. If it's really necessary then use readFileSync
and I think that's why we need to have both readFile
and readFileSync
, right?
@mawaregetsuka both methods will read the whole file for as long as its size is inferior to a predefined limit as it has to fit in memory |
This has been extensively discussed before: #25741 |
so maybe we can let |
@@ -99,7 +97,7 @@ class ReadFileContext { | |||
} else { | |||
buffer = this.buffer; | |||
offset = this.pos; | |||
length = MathMin(kReadFileBufferLength, this.size - this.pos); | |||
length = this.size - this.pos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
length = this.size - this.pos; | |
length = this.signal ? MathMin(kReadFileBufferLength, this.size - this.pos) : this.size - this.pos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ronag I am not sure I understand the benefit of this? this.signal
won't exist when starting the operation?
In fact, after reading the previous discussion I agree that the current implementation has some very important advantages that must be preserved.
One of them is that it is fair and not prone to starvation - because after this PR if you launch 5 readFile
with the current 4 threads you won't start reading the fifth one until the first four are completed.
Maybe this is the best option, yes |
@mscdex @ronag @addaleax @mawaregetsuka I added an in-depth analysis of the performance loss. I also discovered that I agree that everything that can be done as a quick hack at this point is to render the chunk size configurable. |
I wonder if a quick hack, and one that won't make it in a production version before Node 18, is worth it. Now I see that this problem is very real with Node's own I/O. However pushing this change requires a concentrated effort and coordination with the libuv team. I would really like to draw your attention to this, it is very significant. |
@mmomtchev I am glad to be working to solve this problem but I am getting less and less aware of what your goal is. function |
@mawaregetsuka If the underlying issue can be solved, better not push this PR through - if one adds an additional parameter to |
This I can attest is not true. We've been using node at Dowjones since 2011, and there are multiple services where we've benchmarked and tuned the threadpool size for the type of work/servers. |
This PR makes
readFile
as fast asreadFileSync
In fact, there is no point in reading the file
in small chunks as the buffer is already allocated
The
AbortController
does not justify such ahuge performance hit
Refs: #41435