-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent page sweeping #48969
Concurrent page sweeping #48969
Conversation
Can you rebase on top of #48600? |
c2c1855
to
1b22c5d
Compare
Is this intentionally on top of #49644? |
cf8d9fb
to
a9ad110
Compare
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you disentangle this from #49644 so that we can study
#48969 (comment) independently?
Should be independent of #49644 now. |
Latest commit should allow part of sweeping of object pools to run concurrently with mutator threads independently of whether we have GC threads or not (e.g. a program running with The cost if, of course, more contention on |
I think the solution is to do away with that perm_lock. It doesn't seem too complicated to do that and switch to doing Compare and Swap. |
e060962
to
2f9f0ff
Compare
Both are good properties, so if there's (necessarily) a trade-off, can it still me merged with it off by default, and an ENV var to enable for low GC pauses? While you want to avoid allocations, and the GC entirely, for real-time, it's hard to do fully, and shorter pauses very valuable for soft real-time. It's just a question what to call this ENV var, CONCURRENT_SWEEP_GC (or e.g. SOFT_REAL-TIME_GC)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR brings real and tangible benefits for multi-threaded code that is allocating, by significantly shortening the STW phase, therefore improving scalabity (Amdahl's law says hi).
I believe we should add an environment and runtime flag for this feature.
On systems vulnerable to Meltdown&Spector KPTI can cause iTLB flushes. With concurrent page-sweeping instead of paying this cost "once"
we will concurrently invalidate the iTLB leading to runtime performance loss.
In particular for the GCBenchmark tree_multable
I saw an increase in cpu-time being spent in __madvise
and cpu-time being spent in asm_sysvec_call_function
on the threads that are not running concurrent GC.
@kpamnany also voiced discomfort with the system being oversubscribed.
I also found it counter-intuitive that --gcthreads=1
would disable concurrent page sweeping.
In the long-term open questions for me are:
- Could we implement this with the tasking system, e.g. schedule a task that will some cleanup work?
- We could try out
io_uring
for batching the madvise calls, but that would be significant work. - If concurrent sweeping is disabled, we could run this after the STW phase ended, but before the finalizers. This would alleviate some of @kpamnany oversubscription concerns, while still moving the cost out of the STW phase.
To keep things consistent with Open to suggestions on that. |
cee0701
to
4fedfda
Compare
Bump. |
Implements concurrent sweeping of fully empty pages. Concurrent sweeping is disabled by default and may be enabled through the --gcthreads flag. Co-authored-by: Valentin Churavy <[email protected]>
Implements concurrent sweeping of fully empty pages. Concurrent sweeping is disabled by default and may be enabled through the --gcthreads flag. Co-authored-by: Valentin Churavy <[email protected]>
Extends #48600 by making sweeping of object pools concurrent.