Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs.readFile does not partition reads - threadpool exhaustion #17047

Closed
davisjam opened this issue Nov 15, 2017 · 4 comments
Closed

fs.readFile does not partition reads - threadpool exhaustion #17047

davisjam opened this issue Nov 15, 2017 · 4 comments
Labels
fs Issues and PRs related to the fs subsystem / file system. libuv Issues and PRs related to the libuv dependency or the uv binding. performance Issues and PRs related to the performance of Node.js.

Comments

@davisjam
Copy link
Contributor

davisjam commented Nov 15, 2017

Version: Everything
Platform: All
Subsystem: FS

Description
The node master, and all previous releases AFAIK, implement fs.readFile as a call to stat, followed by a request to read the entire file based on the size reported by stat.

Why is this bad?
The effect is to place on the libuv threadpool a "giant" read request, occupying the libuv thread until it completes. While readFile certainly requires buffering the entire file contents, it can partition the read into smaller buffers (as is done on other read paths) to avoid threadpool squatting.

If the file is relatively large or stored on a slow medium, reading the entire file in one shot seems particularly harmful, and seems to present a possible DoS vector.

Downsides to partitioning?
I don't think partitioning the read like this raises any additional risk of read-write races on the FS. If the application is concurrently readFile'ing and modifying the file, it will already see funny behavior. Though libuv uses preadv where available, this doesn't guarantee read atomicity in the presence of concurrent writes.

Related
It might be that writeFile has similar behavior. I didn't check.

PR
I have a patch that partitions readFile. I'm happy to submit a PR if there's interest.

@benjamingr
Copy link
Member

Hey @davisjam thanks for stopping by and making a contribution!

I recommend making a PR if you have a patch already since it's more easy to discuss concrete code changes.

I would love to see some benchmarks detailing the performance characteristics of readFile now vs. how it would look like.

If you'd like assistance setting up the PR or benchmarks please let us know.

@benjamingr benjamingr added fs Issues and PRs related to the fs subsystem / file system. libuv Issues and PRs related to the libuv dependency or the uv binding. performance Issues and PRs related to the performance of Node.js. labels Nov 15, 2017
@benjamingr
Copy link
Member

In particular - results for https://github.com/nodejs/node/blob/master/benchmark/fs/readfile.js with and without the patch would be interesting - it might be a good opportunity to add more benchmarks to readFile anyway.

@davisjam
Copy link
Contributor Author

@benjamingr Thanks for the pointer. I'll add some benchmarks to my PR and open it soon.

@benjamingr
Copy link
Member

@davisjam great, if you're stuck or not sure how to proceed please feel free to leave a comment and we'll do our best to help you. I'll also ping the libuv and fs teams on the actual PR.

Thanks again and good luck :)

davisjam added a commit to davisjam/node that referenced this issue Jan 24, 2018
Problem:

Node implements fs.readFile as:
- a call to stat, then
- a C++ -> libuv request to read the entire file using the stat size

Why is this bad?
The effect is to place on the libuv threadpool a potentially-large
read request, occupying the libuv thread until it completes.
While readFile certainly requires buffering the entire file contents,
it can partition the read into smaller buffers
(as is done on other read paths)
along the way to avoid threadpool exhaustion.

If the file is relatively large or stored on a slow medium, reading
the entire file in one shot seems particularly harmful,
and presents a possible DoS vector.

Solution:

Partition the read into multiple smaller requests.

Considerations:

1. Correctness

I don't think partitioning the read like this raises
any additional risk of read-write races on the FS.
If the application is concurrently readFile'ing and modifying the file,
it will already see funny behavior. Though libuv uses preadv where
available, this doesn't guarantee read atomicity in the presence of
concurrent writes.

2. Performance

Downside: Partitioning means that a single large readFile will
  require into many "out and back" requests to libuv,
  introducing overhead.
Upside: In between each "out and back", other work pending on the
  threadpool can take a turn.

In short, although partitioning will slow down a large request,
it will lead to better throughput if the threadpool is handling
more than one type of request.

Test:

I added test/parallel/test-fs-readfile.js.

Benchmark:

I introduced benchmark/fs/readfile-partitioned.js to characterize
the performance tradeoffs.
See PR for details.

Related:

It might be that writeFile has similar behavior.
The writeFile path is a bit more complex and I didn't
investigate carefully.

Fixes: nodejs#17047
MayaLekova pushed a commit to MayaLekova/node that referenced this issue May 8, 2018
Problem:

Node implements fs.readFile as:
- a call to stat, then
- a C++ -> libuv request to read the entire file using the stat size

Why is this bad?
The effect is to place on the libuv threadpool a potentially-large
read request, occupying the libuv thread until it completes.
While readFile certainly requires buffering the entire file contents,
it can partition the read into smaller buffers
(as is done on other read paths)
along the way to avoid threadpool exhaustion.

If the file is relatively large or stored on a slow medium, reading
the entire file in one shot seems particularly harmful,
and presents a possible DoS vector.

Solution:

Partition the read into multiple smaller requests.

Considerations:

1. Correctness

I don't think partitioning the read like this raises
any additional risk of read-write races on the FS.
If the application is concurrently readFile'ing and modifying the file,
it will already see funny behavior. Though libuv uses preadv where
available, this doesn't guarantee read atomicity in the presence of
concurrent writes.

2. Performance

Downside: Partitioning means that a single large readFile will
  require into many "out and back" requests to libuv,
  introducing overhead.
Upside: In between each "out and back", other work pending on the
  threadpool can take a turn.

In short, although partitioning will slow down a large request,
it will lead to better throughput if the threadpool is handling
more than one type of request.

Fixes: nodejs#17047

PR-URL: nodejs#17054
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
Reviewed-By: Gireesh Punathil <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. libuv Issues and PRs related to the libuv dependency or the uv binding. performance Issues and PRs related to the performance of Node.js.
Projects
None yet
Development

No branches or pull requests

2 participants