-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement parallel ARC eviction #16486
base: master
Are you sure you want to change the base?
Conversation
4cd510d
to
f45bf2e
Compare
f45bf2e
to
146fe45
Compare
I've been casually testing this out (combined with the parallel_dbuf_evict PR) over the last couple of weeks (most recently, 5b070d1 ). I've not been hammering it hard or specifically, just letting it do its thing with my messing-around desktop system. Hit a probable regression today, though: while mv'ing a meager 8GB of files from one pool to another, all my zfs IO got really high-latency, and an iotop showed that the copy part of the move (this being a mv across pools, so in reality it's a copy-and-remove) was running at a painful few 100KB/sec, and the zfs arc_evict thread was taking a whole core... but just one core. In time it all cleared up and of course I can't conclusively blame this PR's changes, but I left with two fuzzy observations:
|
146fe45
to
e128026
Compare
I have updated the patch with a different logic for picking the default maximum number of ARC eviction threads. The new logic aims to pick the number that is one-eighth of the available CPUs, with a minimum of 2 and a maximum of 16. |
Why would we need two evict threads on a single-core system? In that case I would probably prefer to disable taskqs completely. If that is a way to make it more logarithmic, then I would think about |
Right now, this is only enabled by a separate tunable, to enable multiple threads. So for the single CPU case, we don't expect it to be enabled. But for something like 4-12 core systems, we would want it to use at least 2 threads, and then grow from there, reaching 16 threads at 128 cores. |
Now that you mentioned it, I've noticed its been disabled by default. I don't like the idea to tune it manually in production depending on system size. I would prefer to to have reasonable automatic defaults. |
b6a65a2
to
e99733e
Compare
Hey! So, here's what changed in the patch: FormulaThere is now a different formula for automatically scaling the number of evict threads when the parameter is set to
It looks like this (the x axis is the CPU count and the y axis is the evict thread count): Here's also a table:
Less parameters
This approach has been suggested by @tonyhutter in another PR (#16487 (comment)). Stability improvementsIt is no longer possible to modify the actual evict threads count during runtime. Since the evict taskqs are only created during arc_init(), the module saves the actual number of evict threads it is going to use and does not care about changes to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for automating it. Few comments to that part, and please take a look on my earlier comments.
I am not sure it is right, but it seems GCC does no like it:
|
d899eaf
to
3218719
Compare
I gave this another spin (not in isolation though FYI - it was along with the parallel dbuf eviction PR) and got a repeat of the previously noted behavior. Seems to not be coincidence. In stress-testing the intended use-case (chugging through data when the arc is already full) this PR seems benign and probably even beneficial - multiple arc reaper threads are active and busy, and throughput is very healthy. However, later just puttering around in desktop usage under quite light reads I noticed that a reading app is blocked for several seconds at a time and the experience was quite unpleasant. Lo and behold, one or more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. We are getting closer. ;)
Is anything left unresolved? |
@alex-stetsenko I'll take another look a bit later, but meanwhile would be good to fix style issue (some line is too long), squash it all into one commit and rebase on top of master. |
module/zfs/arc.c
Outdated
uint_t nthreads = zfs_arc_evict_threads == 1 ? | ||
zfs_arc_evict_threads_max : | ||
MIN(zfs_arc_evict_threads, zfs_arc_evict_threads_max); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could limit nthreads
to num_sublists
. Multiple threads per sublist might not have much sense. It should normally be true, but zfs_multilist_num_sublists
if configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is worth checking num_sublist when zfs_arc_evict_threads_max is computed. Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it may be some better in practice by not creating extra threads, it dives into internal multilist details, which is not good. I wont insist, but I would not do it this way myself.
558bb2c
to
cea999c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple more comments, but more important please take a look why no CI tests are passing. It seems they all time out for some reason. And since I saw it at least on several pushes of this PR but no others, it makes me think it is not a CI glitch, but something is wrong here.
module/zfs/arc.c
Outdated
uint_t nthreads = zfs_arc_evict_threads == 1 ? | ||
zfs_arc_evict_threads_max : | ||
MIN(zfs_arc_evict_threads, zfs_arc_evict_threads_max); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it may be some better in practice by not creating extra threads, it dives into internal multilist details, which is not good. I wont insist, but I would not do it this way myself.
2818d9c
to
c7e84a8
Compare
@alex-stetsenko The CI still looks very unhappy, but now I have no guesses why. |
c7e84a8
to
444cbae
Compare
444cbae
to
c15ee61
Compare
82278b3
to
a70c007
Compare
a70c007
to
d37a5be
Compare
Read and write performance can become limited by the arc_evict process being single threaded. Additional data cannot be added to the ARC until sufficient existing data is evicted. On many-core systems with TBs of RAM, a single thread becomes a significant bottleneck. With the change we see a 25% increase in read and write throughput Sponsored-by: Expensify, Inc. Sponsored-by: Klara, Inc. Co-authored-by: Allan Jude <[email protected]> Co-authored-by: Mateusz Piotrowski <[email protected]> Signed-off-by: Alexander Stetsenko <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Mateusz Piotrowski <[email protected]>
d37a5be
to
248b3e2
Compare
Sponsored-by: Expensify, Inc.
Sponsored-by: Klara, Inc.
Motivation and Context
Read and write performance can become limited by the arc_evict process being single threaded.
Additional data cannot be added to the ARC until sufficient existing data is evicted.
On many-core systems with TBs of RAM, a single thread becomes a significant bottleneck.
With the change we see a 25% increase in read and write throughput
Description
Use a new taskq to run multiple multiple
arc_evict()
threads at once, each given a fraction of the desired memory to reclaimHow Has This Been Tested?
Benchmarking with a full ARC to measure the performance difference.
Types of changes
Checklist:
Signed-off-by
.