-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct a technical flaw with the spinlock locking: #4201
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad you found the bug and fixed it.
I audited the std::memory_order*
stuff the best I could using information I found on the web. It looks like what you've done aligns with the best practices that I found. That said, using anything other than memory_order_sequential
makes me nervous since I've seen world renowned experts get this wrong. And, as far as I can tell, the "best practices" I found were not written by world renowned experts. All I can do is hope you got it right.
I made some suggestions that I hope you'll consider. But I'm not confident enough about any of them to mandate a change. Let me know what you think. Thanks.
std::void_t< | ||
decltype(std::declval<std::atomic<T>&>().fetch_and(0)), | ||
decltype(std::declval<std::atomic<T>&>().fetch_or(0))>>> | ||
class packed_spinlock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we follow current naming conventions this class should be named PackedSpinlock
, right? And the filename should be fixed to PackedSpinlock.h?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this rename, although the capitalized types "grind" at my eyes a bit. Still, the code style is the code style.
src/ripple/basics/spinlock.h
Outdated
std::is_unsigned_v<T> && std::atomic<T>::is_always_lock_free, | ||
std::void_t< | ||
decltype(std::declval<std::atomic<T>&>().fetch_and(0)), | ||
decltype(std::declval<std::atomic<T>&>().fetch_or(0))>>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See enable_if
comments on packed_spinlock
.
src/ripple/basics/spinlock.h
Outdated
while (!try_lock()) | ||
detail::spin_pause(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit surprised that you switched the implementation to a pure busy wait and lost the yield
that was in the previous implementation. The lack of a yield
seems especially suspect since you've made the packed_spinlock
a general facility rather than a tool used only in one specific place.
I'd expect the lock
implementation to look more like this:
// A spin lock is only likely to be helpful for a small number of tries.
for (int i = 0; i < 100; ++i)
{
if (try_lock())
return;
detail::spin_pause();
}
// If the spinlock didn't work then yield.
while (!try_lock())
{
std::this_thread::yield();
}
If you don't include a yield
in the lock
it seems like you should at least include a comment with some measurements justifying why falling back to a yield
is never appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary justification was to simplify the codepath. There's also some weird (very low-level) interactions of spin_pause
and comparisons that might actually serve to increase latency, which I was hoping to avoid.
I suspect that the yield
isn't going to really help, but at the very least I ought to try and collect some additional metrics.
Fwiw, I also reviewed this and will leave a similar comment. I believe you have it right, not because I know what I'm doing, but because I've seen similar uses elsewhere. This is tricky stuff. |
I'm not a world-renowned expert in memory ordering (although I did read a lot, to try and get this right). I would normally agree with you but the intention here is to squeeze every last bit of performance out of the hardware and I think that the memory ordering will have a (likely small) impact to that. @HowardHinnant, @scottschurr: who would be a renowned expert that we could ask to review this code? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 nice fix; @scottschurr had some good comments; I'm fine with however those are resolved.
src/ripple/basics/spinlock.h
Outdated
packed_spinlock(std::atomic<T>& lock, int index) | ||
: bits_(lock), mask_(1 << index) | ||
: bits_(lock), mask_(detail::masks<T>[index]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm skeptical this makes a difference. I don't object, but I prefer the bitshift (unless you have a benchmark that shows this matters).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't really matter, but it is cool, isn't it? 😀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with @seelabs on this one. I think an explicit bit shift will make the code easier for the bulk of developers to read. Even though, yes, the constexpr
stuff is both fun and cool.
Today I stumbled across this article on spinlocks from 2019: 2019 is recent enough that the information is probably still relevant. That author appears to be knowledgable. The timings seem useful and instructive. Notice in particular the implementation of Just FYI. |
I've tried but failed to get an outside review on the memory ordering. I still believe it is correct. |
This was a great article. The biggest takeaway is that
So yes, they are meant for very specific situations, with a good example being the Beyond that, I should point out that my implementation does use the "test-and-test-and-set" mechanism described here (right down to using the right Ultimately, if you feel strongly enough that this code should just be purely contained within |
FWIW, I wrote a smallish test to exercise However when I use eight threads about half of the time the test hangs when it attempts to join the threads. The hang suggests to me that this implementation of Ultimately I'd like to see this test (or something like it) run for 24 consecutive hours and shutdown with no problems before we call the The test is kind of big (and hacked together). But I'm pasting it here for completeness sake.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few more notes for consideration. But I think the bigger question is about the test sometimes hanging on shutdown with 8 threads.
src/ripple/basics/spinlock.h
Outdated
|
||
/** Array representing all possible N-bit values with a single bit set. */ | ||
template <class T> | ||
constexpr inline auto const masks = [] constexpr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, clang tells me that it wants a ()
before the trailing constexpr
declaration. But lambdas are implicitly constexpr
when they can be. So I think the trailing constexpr
is not necessary.
src/ripple/basics/spinlock.h
Outdated
/** Array representing all possible N-bit values with a single bit set. */ | ||
template <class T> | ||
constexpr inline auto const masks = [] constexpr | ||
{ | ||
std::array<T, sizeof(T) * 8> ret{1}; | ||
|
||
for (std::size_t i = 1; i != ret.size(); ++i) | ||
ret[i] = ret[i - 1] << 1; | ||
|
||
return ret; | ||
} | ||
(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The masks
array is in a detail
namespace, which is good. But, as far as I can tell, masks
is really a detail that belongs only to the packed_spinlock
class. So I would prefer to see masks
as a static constexpr
member of packed_spinlock
. It might look like this...
private:
std::atomic<T>& bits_;
T mask_;
// Create a compile-time array of valid masks for the spinlock.
static constexpr auto const masks_ = []()
{
std::array<T, sizeof(T) * 8> ret{1};
for (std::size_t i = 1; i != ret.size(); ++i)
ret[i] = ret[i - 1] << 1;
return ret;
}();
Then the packed_spinlock
constructor would look like this:
packed_spinlock(std::atomic<T>& lock, int index)
: bits_(lock), mask_(masks_[index])
{
assert(index >= 0 && (mask_ != 0));
}
src/ripple/basics/spinlock.h
Outdated
|
||
private: | ||
std::atomic<T>& bits_; | ||
T mask_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing that your intention is that the packed_spinlock
is immovable? I can't currently see any reason for moving it, but I could be missing something.
At any rate, if it is okay for packed_spinlock
to be immovable, then consider making T const mask_
. It may improve some of the generated code if the optimizer knows the value of mask_
can't change.
src/ripple/basics/spinlock.h
Outdated
packed_spinlock(std::atomic<T>& lock, int index) | ||
: bits_(lock), mask_(1 << index) | ||
: bits_(lock), mask_(detail::masks<T>[index]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with @seelabs on this one. I think an explicit bit shift will make the code easier for the bulk of developers to read. Even though, yes, the constexpr
stuff is both fun and cool.
Well, there was a bug in the test that I pasted above. I had misremembered that an I understand that @nbougalis is running more extensive tests now. Thanks! There are a few other comments I have left on this pull request, and those still stand. But the primary thing I was concerned about is now resolved. Sorry for writing a buggy test, and thanks for figuring out the problem! |
The existing spinlock code, used to protect SHAMapInnerNode child lists, has a mistake that can allow the same child to be repeatedly locked under some circumstances. The bug was in the `SpinBitLock::lock` loop condition check and would result in the loop terminating early. This commit fixes this and further simplifies the lock loop making the correctness of the code easier to verify without sacrificing performance. It also promotes the spinlock class from an implementation detail to a more general purpose, easier to use lock class with clearer semantics. Two different lock types now allow developers to easily grab either a single spinlock from an a group of spinlocks (packed in an unsigned integer) or to grab all of the spinlocks at once. While this commit makes spinlocks more widely available to developers, they are rarely the best tool for the job. Use them judiciously and only after careful consideration.
Looks like all of my comments have been addressed other than the naming issues. For naming compatibility with this code base the new classes should probably be named |
Merged as 7e46f53 |
The existing spinlock code, used to protect
SHAMapInnerNode
child lists, has a mistake that can allow the same child to be repeatedly locked under some circumstances.The bug was in the
SpinBitLock::lock
loop condition check and would result in the loop terminating early.This commit fixes this and further simplifies the lock loop making the correctness of the code easier to verify without sacrificing performance.
It also promotes the spinlock class from an implementation detail to a more general purpose, easier to use lock class with clearer semantics. Two different lock types now allow developers to easily grab either a single spinlock from an a group of spinlocks (packed in an unsigned integer) or to
grab all of the spinlocks at once.
While this commit makes spinlocks more widely available to developers, they are rarely the best tool for the job. Use them judiciously and only after careful consideration.
High Level Overview of Change
Context of Change
Type of Change