-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File: Try doing a non-blocking read before punting to the threadpool #3518
Conversation
Are we sure this works? I have encountered lots of APIs where files just pretend that they are not blocking and then proceed to block anyway. |
The man page says:
And here's a lwn.net article discussing the same: https://lwn.net/Articles/636967/ :
And if |
I guess that does sound like it should work. |
227421a
to
2b29d82
Compare
I skimmed the PR and looks promising. The overall detection strategy and static atomic seem fine as well. I'm not particularly worried about any overhead around checking an atomic. So, I guess apply any changes you still need to and flag the PR for review... a test somehow would be nice, but I'm not sure how that would work off the top of my head. |
It seems like the libc change has been released. |
23a3c31
to
e53eb67
Compare
I've added a test using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general the PR looks pretty good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is a really good idea. I just corrected a few minor spelling/grammar things in the comments because why not.
I don't know why CI is failing now. I only changed some comments, and it seems to affect code that is unrelated to my change. |
It's because the new release of Rust added some new warnings. |
5bd8ea4
to
6ff4932
Compare
...on Linux. If the data is already available in cache this will avoid cross-thread interaction and remove a copy. It should help with latency too as reads that can be satisfied now won't need to wait in queue until other fs operations are complete.
...in an attempt to stimulate both the `preadv2` direct from cache codepath in addition to the punt to a threadpool uncached one.
This way we don't need to worry about compatiblity with old glibc versions and it will work on Android and musl too. The downside is that you can't use `LD_PRELOAD` tricks any more to intercept these calls.
6ff4932
to
0b7d90e
Compare
Ok, I've rebased to fix CI squashing in the various fixes. There's nothing left on the TODO list that I can think of now. |
I think it would be good to add a short comment that explains why we are calling the syscall directly rather than using the libc function. |
I was just looking through the list of PRs. Are there any things that needs to be done, or do we just need a final review? |
b9a8fcd
to
b0c98df
Compare
b0c98df
to
007d363
Compare
007d363
to
3a65b19
Compare
3a65b19
to
5874d63
Compare
Done
I've resolved all the outstanding issues now. Thanks for the review. I've included my changes since the last review as fixup commits. Let me know if you want me to squash them. |
I squash them on merge, so it is unnecessary in the PR branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this looks good now.
Ok, there will be conflicts, but if you're comfortable with that so am I :). |
Motivation
I saw this on the rust subreddit and I wondered if
preadv2(..., RWF_NOWAIT)
would help.Solution
Try doing a non-blocking read before punting to the threadpool on Linux.
If the data is already available in cache this will avoid cross-thread interaction and remove a copy. It should help with latency too as reads that can be satisfied now won't need to wait in queue until other fs operations are complete.
According to my basic testing it helps. The benchmark I ran was https://github.com/wmanley/tokio-fs-bench which is based on the gist from the reddit thread above. On my laptop with the frequency scaling disabled I get 32s with this change and 42s without. For comparison the rust_block_in_place implementation runs in 11s.
So it's faster in this micro-benchmark, but it would be good to see if it actually helps with representative workloads. Perhaps you can suggest something?
The implementation is a bit rough-and-ready. It requires an unreleased libc and currently it'll only work on glibc.I wanted to raise it early to get feedback as to whether the work is worthwhile in the first place.TODO:
preadv2
RWF_NOWAIT
toopreadv2_safe
, particularly around uninitialised data handling.