-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validations::flush() doesn't use JobQueue (RIPD-1356): #2083
Validations::flush() doesn't use JobQueue (RIPD-1356): #2083
Conversation
In general, using the JobQueue during shutdown is risk prone. Validations::flush() no longer uses the JobQueue for its database write at shutdown. It performs the write directly without changing threads. Currently the only place that calls Validations::flush() is in ApplicationImp::onStop() during shutdown.
Codecov Report
@@ Coverage Diff @@
## develop #2083 +/- ##
==========================================
- Coverage 68.88% 68.8% -0.08%
==========================================
Files 676 680 +4
Lines 49495 49600 +105
==========================================
+ Hits 34095 34129 +34
- Misses 15400 15471 +71
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
src/ripple/app/misc/Validations.cpp
Outdated
doWrite (sl); | ||
} | ||
|
||
void doWrite (ScopedLockType& sl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the purpose of passing in the scoped lock to doWrite
? I'm not able to determine why its needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing in the reference to the lock is not required by the functionality. It's a way of signaling to maintainers that the lock must be held when doWrite()
is called. I suppose a comment to that effect would be useful as well 😃.
I ran an additional 1100 start / stay-synced-for-60-seconds / stop loops on this code over the weekend. Everything looked good to me, so I'm feeling pretty confident about the gist of this pull request. |
src/ripple/app/misc/Validations.cpp
Outdated
} | ||
|
||
void doWrite () | ||
void queuedWrite () | ||
{ | ||
auto event = app_.getJobQueue ().getLoadEventAP (jtDISK, "ValidationWrite"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of event
? It appears unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it saves load/timing information for the ValidationWrite
job. getLoadEventAP
returns a pointer to https://github.com/ripple/rippled/blob/develop/src/ripple/core/LoadEvent.h. I think you need to hold onto the event
for the duration of this block to capture that timing info. Definitely a non-intuitive design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! Ok. I haven't delved into the LoadEvent
s much (at all).
{ | ||
ScopedUnlockType sul (mLock); | ||
std::this_thread::sleep_for (std::chrono::milliseconds (100)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears the mWriting
is safe to read while not under lock, so unless I'm missing something, you can end the scope of sl
after doWrite
, and avoid the overhead of taking and releasing the lock in this loop (not to mention fighting with the other thread for the lock).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of the standard is that any unsynchronized access (read or write) of a non-atomic by two different threads is undefined behavior. If my understanding is correct, then since mWriting
is a standard bool
it needs the lock. Undefined behavior is nobody's friend (despite what the optimizer folks will tell you).
However I could convert mWriting
into an std::atomic<bool>
and we could release the lock after the call to doWrite
. I'm not convinced this would be a big win, since flush()
only occurs once during the life of the program. Making mWriting
an atomic
has the disadvantage that it constrains the optimizer in other parts of the code. But it could certainly be done. Shall I convert mWriting
to an atomic
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My vote is to leave the code as Scott has written it. Scott is correct that the read needs to be protected, and this seems better than the atomic<bool>
solution considering the interaction with the other parts of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds legit to me. The only reason I brought it up was that I noticed yesterday that there are several other instances of mWriting
being accessed when not under lock. Looking at it again, it appears that the only time it's not explicitly locked is in in condWrite
, and condWrite
appears to always be called under lock.
Soooooooo... Maybe do the same trick in doWrite
in condWrite
so that we don't make this same confusing mistake again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS. I'll note that my original comment had the disclaimer "unless I'm missing something". I missed something. 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll pass a ScopedLockReference
into condWrite()
from it's callers. I thought about doing that when made created doWrite()
. At the time I was leaning toward minimal changes, but I think this is a good change to make.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I think it's better to ask the question and miss something than to not ask the question. So thanks for asking.
if (it.second) | ||
{ | ||
mStaleValidations.push_back (it.second); | ||
anyNew = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good fix!
src/ripple/app/misc/Validations.cpp
Outdated
} | ||
|
||
void doWrite () | ||
void queuedWrite () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can queuedWrite
be inlined to the lambda body on Line 471?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's an excellent suggestion. Thanks.
Latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay! 👍
Incorporated into #2084. |
In general, using the
JobQueue
during shutdown is risk prone.Validations::flush()
no longer uses theJobQueue
for its database write at shutdown. It performs the write directly without changing threads.Currently the only place that calls
Validations::flush()
isApplicationImp::onStop()
during shutdown.I've tested this change by doing start / sync / wait-60-seconds / stop in a loop 50 times. It's held up so far. I intend to do several hundred more loops over the weekend.