-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive calls to poll
and cloning of Waker
s.
#4990
Comments
After some more research... I found out bottleneck is where is should be, I mistook the instruction for something else, so excessive polling most likely does not happen, consider this issue irrelevant. |
Thank you for the research @jakubDoka!
This is further discussed in #4284. Help much appreciated @jakubDoka.
Agreed. I would assume this to be especially powerful within the |
I'd like to see some benchmarks that moving to It is an unfortunate penality that we pay here as this scales with N (the number of protocols you plug into |
The storage should be an implementation detail. Why is that a breaking change? |
@thomaseizinger because we use |
I discovered quite the opposite problem when experimenting with optimized polling. Without proper waking in my handler, the whole swarm gets stuck because connection event does not cause a repoll anymore (with scoped waking). This behaviour is valid one as documented in the standard library. Future may be polled more then needed though that should not be relied upon (wake if you want to be polled), but, e.g., |
Summary
I
perf
ed my protocol and discovered major hot spots within the repo code mostly ending up inlibp2p_swarm::connection::Connection<THandler>::poll
and yamux implementation. From the looks of it the "nothing to do" paths are constantly being hit and subsequently theArc
s dropped andWaker
s cloned. I quickly noticed that code usually tries to poll everything anytime one centralWaker
triggers.Expected behavior
Ideally, code selecting many futures in parallel should branch the main
Waker
for each future to collect information on what actually needs polling.Actual behavior
Most of the hot instructions are doing atomic operations within
Waker::clone
and code almost always looks like this:The most significant hot spot though is something different. here we unconditionally create
Arc
for each available protocol as the last thing before breaking the loop. The crux of the problem is that we call the poll to drop someArc
s and cloneWaker
s most of the time with no real work getting done.Relevant log output
No response
Possible Solution
Simplest fix for updating cached wakers is refactoring the function to:
Although this is just fixing the symptom, its still a substantial improvement considering how easy it is to implement.
Another simple fix is avoiding the
Arc
creation in mentioned section of the code. I managed to pull it off but not without breaking the API (set intersection and difference iterators getting replaced withSmallVec
s inProtocolChange
events).Now for the main cause of all this...
Excessive Polling
All the mentioned code is almost exclusively placed at spots where handwritten futures ran out of things to do, this is also the code that is executed all the time. My conclusion is that we poll when we don't need to. Solving this though is hard and requires lot of changes, and careful testing (since eliminating excessive polls is painful since bugs exhibit as program getting blocked forever).
Version
b7914e4
Would you like to work on fixing this bug ?
Yes
The text was updated successfully, but these errors were encountered: