Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harden validations, report server version & deprecate TTL fields #3291

Merged
merged 3 commits into from
May 6, 2020
Merged

Harden validations, report server version & deprecate TTL fields #3291

merged 3 commits into from
May 6, 2020

Conversation

nbougalis
Copy link
Contributor

This PR contains three separate commits:

Remove obsolete TTL fields

Proposals and validations include a TTL field. This was part of an earlier idea which never really materialized. The fields were never really used in practice (a TTL value is not set by default). This change simply makes this official: the fields are deprecated, should not be used and are not coming back.

Ripple engineers will be implementing the system outlined by @JoelKatz in this post.

Include version in validations

Currently, it is impossible to know what version of the software a given validator is running. This information is useful for server operators, for developing who are putting together dashboards, and the community at large.

This change adds a "packed" version in a validation message. The message will, by default, only be sent every 256 ledgers.

Harden validations

Several changes are included in this PR.

  1. Track how many validations a given server publishes for a given ledger. This should be precisely one. When such validations are detected, the server will now log a message to notify server operators of this issue.
  2. The commit adds a 64-bit "cookie" in every validation which can be used to attempt to decipher whether multiple validations are the result of accidental misconfiguration vs. outright byzantine behavior. It is important to note that a malicious validator operator can manipulate the cookie.
  3. Adds the hash of the last ledger a validator considers to be fully validated in the validation message.

@nbougalis
Copy link
Contributor Author

nbougalis commented Mar 9, 2020

(an unofficial lookover of 330f35b by @bachase would be awesome!)

@MarkusTeufelberger
Copy link
Collaborator

When such validations are detected, the server will now log a message to notify server operators of this issue.

Could we optionally also get a full log (or sqlite database, if you feel fancy) of all validation messages received? As far as I know, this data is currently only available on WebSocket streams.

@nbougalis
Copy link
Contributor Author

When such validations are detected, the server will now log a message to notify server operators of this issue.

Could we optionally also get a full log (or sqlite database, if you feel fancy) of all validation messages received? As far as I know, this data is currently only available on WebSocket streams.

We used to record validations in an SQL table, and we removed that functionality (see c5a95f1), because it was not really used by anyone/anything.

I think (and this is just my personal position) that the best option is to use the subscription and to index the received data separately. But would we, for example, report conflicting validations in the subscription stream?

Happy to discuss what to do here.

Copy link
Collaborator

@bachase bachase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only focused on 330f35b. I'm not convinced the new checks are reachable as implemented.

I'm generally supportive of them, although I'm not sure the best way to make them actionable within the current rippled reporting mechanisms. But that can always be something we build towards.

@@ -167,7 +167,11 @@ enum class ValStatus {
/// Not current or was older than current from this node
stale,
/// A validation violates the increasing seq requirement
badSeq
badSeq,
/// Multiple validations for the same sequence for same from multiple validators
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo, extra for same?


// If the ledger hash matches, it must be a validation for
// the same ledger:
bool const same = (it->second.ledgerID() == val.ledgerID());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear how these conditions wouldn't already be blocked by the badSeq one above. That is, the enforcer requires a strictly increasing sequence number seen by a validator. So I think it already filters these alternate outcomes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do want to check these, an option would be to move them above before badSeq returns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting comment here @bachase . I would argue that this is a valid check due to malicious/byzantine behavior. An attacker can increment the seq without changing the content. So this check can help us catch those byzantine behaviors is what I am thinking. I was thinking maybe the signature check would block the logic here but it seems to me that sfLedgerHash is just one field within the digest so a byzantine node can still sign the same validation with a different sequence and this logic here would catch it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlhua but doesn't this check presume the two validations being compared have the same sequence number? They both come from the bySequence_ map in Line 614.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bachase you are right!! These checks would still guard against the scenarios depicted in the code right?

  1. Same sequence, hashs match but cookies don't so we say its multiple
  2. Same sequence, hashes don't match, we say its conflicting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sorted this out already. I this I misfolded some of my work before pushing. See updated commit.

std::max(it->second.signTime(), val.signTime()) -
std::min(it->second.signTime(), val.signTime());

if (diff > std::chrono::minutes{10} &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using one of the other parameterized durations over this hardcoded 10 minutes.

auto validationTime = app_.timeKeeper().closeTime();
if (validationTime <= lastValidationTime_)
validationTime = lastValidationTime_ + 1s;
lastValidationTime_ = validationTime;

STValidation::FeeSettings fees;
std::vector<uint256> amendments;
auto v = std::make_shared<STValidation>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Much cleaner

@@ -1080,6 +1080,14 @@ class Validations_test : public beast::unit_test::suite
}
}

void
testDifferentCookies()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think testing this would be useful; I'm a little rusty on the test harness here but I think it should be feasible and can try to help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the offer. I'll work with someone else to try and get these tests so that more people learn about consensus instead of having all the information locked up in the heads of three or four people.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is important. I'm not sure it needs to be addressed before this is merged (that's up to Brad and Nik), but it would be good to assign this task to someone before this is merged, just so we don't lose track of the task.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be happy to work on it, probably after the negative UNL network tests, if that is OK.

Copy link
Contributor

@carlhua carlhua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR, I am generally a huge fan of improving consensus. I left some comments down below. Aside from those comments, I think this PR will have API, documentation and Explorer impact so I will tag these to have downstream impacts via tags. @intelliot @mDuo13


// If the ledger hash matches, it must be a validation for
// the same ledger:
bool const same = (it->second.ledgerID() == val.ledgerID());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting comment here @bachase . I would argue that this is a valid check due to malicious/byzantine behavior. An attacker can increment the seq without changing the content. So this check can help us catch those byzantine behaviors is what I am thinking. I was thinking maybe the signature check would block the logic here but it seems to me that sfLedgerHash is just one field within the digest so a byzantine node can still sign the same validation with a different sequence and this logic here would catch it

if(j.debug())
dmp(j.debug(), to_string(outcome));

if (outcome == ValStatus::conflicting && j.fatal())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple thoughts here:

  1. Consider this condition to be one of the criteria to be added to nUNL @pwang200 (future, but if we agree we should open an issue to track)
  2. Consider this condition to be folded into peer selection/overlay enhancement - this is a byzantine behavior and we should consider disconnecting from this node or at least not connect to this node. Currently, this might not be a good idea but in the future when we are actively relaying validations from untrusted peers, we should consider keep records and potentially not connecting to these bad nodes. My argument here is that this allows us to eliminate bad nodes from the network.

void
testDifferentCookies()
{
testcase("Multiple Validations");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to have more unit tests here. In addition, I would ask @manojsdoshi to consider to have automated tests for this PR to force the logic to execute on QE network. This should never happen naturally on the real network but it would be useful to see it in action in test.

@carlhua carlhua added API Change Documentation README changes, code comments, etc. labels Mar 19, 2020
@mDuo13
Copy link
Collaborator

mDuo13 commented Mar 26, 2020

I don't see any documentation impacts beyond the new amendment. Am I missing anything? Are there changes to the fields provided in any of the API subscription streams for example?

HowardHinnant
HowardHinnant previously approved these changes Mar 27, 2020
gregtatcam
gregtatcam previously approved these changes Apr 3, 2020
if(j.debug())
dmp(j.debug(), to_string(outcome));

if (outcome == ValStatus::badSeq && j.warn())
if (outcome == ValStatus::conflicting && j.fatal())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this checks j.fatal(), but logs below with j.warn(). Same for line 208

src/ripple/consensus/Validations.h Outdated Show resolved Hide resolved
// Two validations for the same sequence but for different
// ledgers. This could be the result of misconfiguration
// but it can also mean a Byzantine validator.
if (!same)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they have different ledger IDs, what guarantees they still have the same sequence number at this spot in the code? Do you also have to check seqInserted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll double-check this. Thanks.

optional bool checkedSignature = 2 [deprecated = true];

// Number of hops traveled
optional uint32 hops = 3 [deprecated = true];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could hops still be useful for the reduce relaying task? We can infer the preferred peer to receive TMValidation and TMPropose from as the ones with the smallest number of hops.

@nbougalis nbougalis dismissed stale reviews from gregtatcam and HowardHinnant via 4ea4837 April 17, 2020 04:07
@nbougalis
Copy link
Contributor Author

I think I've addressed all your concerns, except for improved unit test @bachase. I've avoided rebasing this to make reviewing easier; once you sign-off, I'll do it and force-push the branch.

In the meantime, I'm running a rebased version against the network.

@carlhua carlhua requested a review from bachase April 17, 2020 14:03
bachase
bachase previously approved these changes Apr 17, 2020
Copy link
Collaborator

@bachase bachase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nbougalis -- logic looks good now.

carlhua
carlhua previously approved these changes Apr 21, 2020
Copy link
Contributor

@carlhua carlhua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@carlhua carlhua added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Apr 21, 2020
@carlhua carlhua removed the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Apr 28, 2020
@carlhua carlhua added Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. and removed Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. labels Apr 28, 2020
@nbougalis
Copy link
Contributor Author

I'm going to rebase this against b4 and squash.

@nbougalis nbougalis dismissed stale reviews from carlhua and bachase via 117c724 April 29, 2020 04:33
@nbougalis nbougalis mentioned this pull request Apr 29, 2020
@carlhua carlhua removed the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label May 1, 2020
nbougalis added 2 commits May 1, 2020 12:55
This commit introduces the "HardenedValidations" amendment which,
if enabled, allows validators to include additional information in
their validations that can increase the robustness of consensus.

Specifically, the commit introduces a new optional field that can
be set in validation messages can be used to attest to the hash of
the latest ledger that a validator considers to be fully validated.

Additionally, the commit leverages the previously introduced "cookie"
field to improve the robustness of the network by making it possible
for servers to automatically detect accidental misconfiguration which
results in two or more validators using the same validation key.
Currently there is no mechanism for a validator to report the
version of the software it is currently running. Such reports
can be useful for those who are developing network monitoring
dashboards and server operators in general.

This commit, if merged, defines an encoding scheme to encode
a version string into a 64-bit unsigned integer and adds an
additional optional field to validations.

This commit piggybacks on "HardenedValidations" amendment to
determine whether version information should be propagated
or not.

The general encoding scheme is:

XXXXXXXX-XXXXXXXX-YYYYYYYY-YYYYYYYY-YYYYYYYY-YYYYYYYY-YYYYYYYY-YYYYYYYY

X: 16 bits identifying the particular implementation
Y: 48 bits of data specific to the implementation

The rippled-specific format (implementation ID is: 0x18 0x3B) is:

00011000-00111011-MMMMMMMM-mmmmmmmm-pppppppp-TTNNNNNN-00000000-00000000

    M: 8-bit major version (0-255)
    m: 8-bit minor version (0-255)
    p: 8-bit patch version (0-255)
    T: 11 if neither an RC nor a beta
       10 if an RC
       01 if a beta
    N: 6-bit rc/beta number (1-63)
@nbougalis nbougalis mentioned this pull request May 3, 2020
@nbougalis nbougalis merged commit 2827de4 into XRPLF:develop May 6, 2020
@nbougalis nbougalis deleted the hardv branch October 16, 2023 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation README changes, code comments, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants