-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
only forward to p2p nodes that are in sync #4028
Conversation
{ | ||
increment(); | ||
continue; | ||
} | ||
res = sources_[sourceIdx]->forwardToP2p(context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can change this to src->forwardToP2p(context);
std::optional<uint32_t> | ||
tryGetMostRecent() | ||
{ | ||
return max_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to read max_
with m_
locked. Otherwise you can get a torn optional
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method should probably be const
(as should getMostRecent()
which is probably why this method is not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make these methods const
, std::condition_variable cv_
has to be made mutable
. I'll make that change too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to read
max_
withm_
locked. Otherwise you can get a tornoptional
.
Good catch
This branch is not building for me locally. Apparently ETLHelpers.h was changed in an earlier commit which is not part of this stack? I'll give a best effort at reviewing the code without building. But it means when the next build is assembled this commit cannot be placed arbitrarily. |
@scottschurr This branch is building for me and passing unittests. |
@natenichols, sorry, the mistake was mine. I'd slipped up with my text editor. |
@@ -760,13 +760,24 @@ ETLLoadBalancer::forwardToP2p(RPC::JsonContext& context) const | |||
srand((unsigned)time(0)); | |||
auto sourceIdx = rand() % sources_.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't fix this right now, but I think there's a better way to get a uniform int distribution post C++11: https://en.cppreference.com/w/cpp/numeric/random/uniform_int_distribution This may not be a critical place for it, but we should still model good coding practices.
That's a good change to make some time in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me. 👍
This commit was made by @cjcobb23 and has been running on our reporting servers.
High Level Overview of Change
Previously, reporting mode servers blindly forwarded a request to one of the p2p nodes at random. If the p2p node is totally down, reporting mode would try a different server. But if the p2p node was up but not synced, the reporting mode server considered the response valid, and returned
noNetwork
. This affected requests such assubmit
andfee
.With this change, the reporting mode server only forwards requests to p2p mode servers that have the latest ledger. This would cover the situation where servers are not synced, as well as amendment blocked servers.
Context of Change
In s2, one of the p2p mode servers lost sync, and reporting mode was returning errors for
submit
.Test Plan
I brought up a reporting mode server with two ETL sources. I restarted one of the ETL sources, and then sent
fee
requests to the reporting mode server. Before this change, roughly half of the requests would returnnoNetwork
, since the restarted ETL source was not yet synced. After this change, every request succeeded, since at least one ETL source was in sync.