Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validations::flush() doesn't use JobQueue (RIPD-1356): #2083

Closed

Conversation

scottschurr
Copy link
Collaborator

In general, using the JobQueue during shutdown is risk prone. Validations::flush() no longer uses the JobQueue for its database write at shutdown. It performs the write directly without changing threads.

Currently the only place that calls Validations::flush() is ApplicationImp::onStop() during shutdown.

I've tested this change by doing start / sync / wait-60-seconds / stop in a loop 50 times. It's held up so far. I intend to do several hundred more loops over the weekend.

In general, using the JobQueue during shutdown is risk prone.
Validations::flush() no longer uses the JobQueue for its database
write at shutdown.  It performs the write directly without
changing threads.

Currently the only place that calls Validations::flush()
is in ApplicationImp::onStop() during shutdown.
@codecov-io
Copy link

codecov-io commented Apr 8, 2017

Codecov Report

Merging #2083 into develop will decrease coverage by 0.07%.
The diff coverage is 84.61%.

Impacted file tree graph

@@            Coverage Diff             @@
##           develop   #2083      +/-   ##
==========================================
- Coverage    68.88%   68.8%   -0.08%     
==========================================
  Files          676     680       +4     
  Lines        49495   49600     +105     
==========================================
+ Hits         34095   34129      +34     
- Misses       15400   15471      +71
Impacted Files Coverage Δ
src/ripple/app/misc/Validations.cpp 67.81% <84.61%> (-0.28%) ⬇️
src/ripple/peerfinder/impl/Logic.h 41.41% <0%> (-7.94%) ⬇️
src/ripple/app/misc/SHAMapStoreImp.cpp 78.86% <0%> (-1.04%) ⬇️
src/ripple/basics/impl/Time.cpp 100% <0%> (ø) ⬆️
src/ripple/app/tx/impl/PayChan.h 100% <0%> (ø) ⬆️
src/ripple/json/Writer.h 100% <0%> (ø) ⬆️
src/beast/test/http/message.cpp
src/protobuf/vsprojects/config.h
src/protobuf/src/google/protobuf/message.h
src/lz4/lib/xxhash.c
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c6b6d82...8534e1b. Read the comment docs.

Copy link
Collaborator

@bachase bachase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

doWrite (sl);
}

void doWrite (ScopedLockType& sl)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the purpose of passing in the scoped lock to doWrite? I'm not able to determine why its needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing in the reference to the lock is not required by the functionality. It's a way of signaling to maintainers that the lock must be held when doWrite() is called. I suppose a comment to that effect would be useful as well 😃.

@scottschurr
Copy link
Collaborator Author

I ran an additional 1100 start / stay-synced-for-60-seconds / stop loops on this code over the weekend. Everything looked good to me, so I'm feeling pretty confident about the gist of this pull request.

}

void doWrite ()
void queuedWrite ()
{
auto event = app_.getJobQueue ().getLoadEventAP (jtDISK, "ValidationWrite");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of event? It appears unused.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it saves load/timing information for the ValidationWrite job. getLoadEventAP returns a pointer to https://github.com/ripple/rippled/blob/develop/src/ripple/core/LoadEvent.h. I think you need to hold onto the event for the duration of this block to capture that timing info. Definitely a non-intuitive design.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Ok. I haven't delved into the LoadEvents much (at all).

{
ScopedUnlockType sul (mLock);
std::this_thread::sleep_for (std::chrono::milliseconds (100));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears the mWriting is safe to read while not under lock, so unless I'm missing something, you can end the scope of sl after doWrite, and avoid the overhead of taking and releasing the lock in this loop (not to mention fighting with the other thread for the lock).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the standard is that any unsynchronized access (read or write) of a non-atomic by two different threads is undefined behavior. If my understanding is correct, then since mWriting is a standard bool it needs the lock. Undefined behavior is nobody's friend (despite what the optimizer folks will tell you).

However I could convert mWriting into an std::atomic<bool>and we could release the lock after the call to doWrite. I'm not convinced this would be a big win, since flush() only occurs once during the life of the program. Making mWriting an atomic has the disadvantage that it constrains the optimizer in other parts of the code. But it could certainly be done. Shall I convert mWriting to an atomic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is to leave the code as Scott has written it. Scott is correct that the read needs to be protected, and this seems better than the atomic<bool> solution considering the interaction with the other parts of the code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds legit to me. The only reason I brought it up was that I noticed yesterday that there are several other instances of mWriting being accessed when not under lock. Looking at it again, it appears that the only time it's not explicitly locked is in in condWrite, and condWrite appears to always be called under lock.

Soooooooo... Maybe do the same trick in doWrite in condWrite so that we don't make this same confusing mistake again?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS. I'll note that my original comment had the disclaimer "unless I'm missing something". I missed something. 😆

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll pass a ScopedLockReference into condWrite() from it's callers. I thought about doing that when made created doWrite(). At the time I was leaning toward minimal changes, but I think this is a good change to make.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I think it's better to ask the question and miss something than to not ask the question. So thanks for asking.

if (it.second)
{
mStaleValidations.push_back (it.second);
anyNew = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix!

}

void doWrite ()
void queuedWrite ()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can queuedWrite be inlined to the lambda body on Line 471?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's an excellent suggestion. Thanks.

@scottschurr
Copy link
Collaborator Author

Latest [FOLD] commit addresses most recent review comments.

Copy link
Collaborator

@ximinez ximinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay! 👍

@scottschurr scottschurr added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Apr 19, 2017
@scottschurr
Copy link
Collaborator Author

Incorporated into #2084.

@scottschurr scottschurr deleted the synchronous-validations-flush branch April 25, 2017 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants