Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow second sync committee message to propagate if first one is stale #4015

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

etan-status
Copy link
Contributor

The sync committee message signature does not sign over the slot, making it possible for stale messages to be adjusted and replayed, possibly preventing new sync committee messages to propagate. If such a message for an older head is received, allow a second message to propagate that refers to the current head as selected by fork choice, overriding the previous one.

Also add a couple sanity checks, as in, disallowing messages referring to unknown or invalid blocks to spread, in line with existing rules for blob sidecars, parent block roots and attestations.

The sync committee message signature does not sign over the slot, making
it possible for stale messages to be adjusted and replayed, possibly
preventing new sync committee messages to propagate. If such a message
for an older head is received, allow a second message to propagate that
refers to the current head as selected by fork choice, overriding the
previous one.

Also add a couple sanity checks, as in, disallowing messages referring
to unknown or invalid blocks to spread, in line with existing rules for
blob sidecars, parent block roots and attestations.
@etan-status
Copy link
Contributor Author

Implemented in Nimbus: status-im/nimbus-eth2#4953

- _[IGNORE]_ There has been no other valid sync committee message for the declared `slot` for the validator referenced by `sync_committee_message.validator_index`
- _[IGNORE]_ The block being signed (`sync_committee_message.beacon_block_root`) has been seen (via both gossip and non-gossip sources) (a client MAY queue sync committee messages for processing once block is retrieved).
- _[REJECT]_ The block being signed (`sync_committee_message.beacon_block_root`) passes validation.
- _[IGNORE]_ There has been no other valid sync committee message for the declared `slot` for the validator referenced by `sync_committee_message.validator_index`, unless the block being signed (`beacon_block_root`) matches the local head as selected by fork choice, and the earlier valid sync committee message does not match
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about slightly different rule? Unless the beacon_block_root refers to a block with a slot greater than the one that has already been seen. This could remove the dependency on the local view to the head of the chain and make it more resilient

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that allow spamming, i.e., by sending a message with slot 5, then 6, then 7, then 8.

And in some reorgs, the parent block of the next block may have a lower slot than the highest one observed across all branches.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that allow spamming

There is the block.slot that should increase to avoid discarding the message, not the message.slot. I don’t think it has any room for spamming as the beacon_block_root is signed.

And in some reorgs, the parent block of the next block may have a lower slot than the highest one observed across all branches.

Right, but in this case the next block should have a greater slot. There are edge cases when re-org happens to a block with a lower slot than the slot of the previous head, but this should be a rare case on mainnet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, block.slot would work there as the spam protection as well. The edge cases with reorg to a lower parent slot would get worse, though, compared to proposed solution. It's also still depending on the local view of the chain as a whole, as in, one still needs to have the block available for the second message to propagate. From a reward perspective, the only message that has any value is one that signs the correct parent root of the next slot's block.

I guess to decide which strategy is best, maybe @potuz ? Maybe there's an even better way :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, no, the spam problem is still there. Someone could go extreme and sign beacon block from 2 weeks ago with message.slot set to the current slot. and then send another message that signs the beacon block 1 after that. and so on, until the present. and all of those messages would propagate, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The split view is not really prevented though, as long as the fix incorporates that the block header being signed has to be known. The honest signatures should only come ~4s into the slot (or at the very least after the block, similar to attestations, there are incentives to vote for the most likely head as selected by forkchoice); so at that stage fork choice is already done.

btw, as your suggestion doesn't support the reorg to lower slot case, couldn't it be simplified to, instead of requiring monotonical increase, just allow a second message to sign a block header for the current slot? that would fix the spam potential but keep the edge case problem open.

If one wants to get rid of fork choice dependency, the rules could also be changed to allow a vote for a single additional unknown/invalid block root to propagate. i.e., allow an initial message to spread (whatever), potentially allow a second message to spread if it refers to an unknown/invalid block root (to cover the split view case), and allow a final message to spread if it pertains to the head root (as selected by forkchoice) as proposed (to cover reorg to lower slot etc).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose, there is a block A with two children B and C, C.slot = B.slot + 1. C is timely only for a half of nodes and weights are distributed in a way that C is the winner for those who has received it in time (proposer_boost_weight > get_weight(B)), so a half of sync committee participants sends committee message with C as the head and another half with B. Both block are eventually seen and correct but nodes with head=B can’t propagate the messages with beacon_block_root = htr(C) and vice versa.

doesn't support the reorg to lower slot case, couldn't it be simplified to, instead of requiring monotonical increase, just allow a second message to sign a block header for the current slot

It’s not strictly the same as validator could send a previous message for a block in slot N-2 and now the head is the block from slot N-1. In this case monotonically increasing constraint would allow the second message while same slot rule would not.

I think ideally there should be a SignedSyncCommitteeMessage to prevent slot tampering. So the original problem is resolved without a workaround, as it seems any workaround has its own corner cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. btw the workaround only triggers if there is active slot tampering going on. In any case, the initial message per validator propagates regardless of restrictions.

Your example could be mitigated by allowing in each slot for (1) the first message to spread, (2) a message pertaining to an unknown root to spread, (3) a message pertaining to the local head to spread.

Your case is essentially triggering (2), as the local node does not know the slot of the signed root at the time it receives the sync message in that scenario. So at that stage, it would result in IGNORE and the message is already stopped before the blocks end up being seen. And IGNORE is cached on the libp2p message-id, so the message still won't propagate lateron once the block is seen.

The problem with (2) is, that one has to also cache a list of all unknown roots that have been signed. Otherwise, if there is an old orphaned block somewhere that is not available in local DB and also is no longer available on the network, an attacker could just replay that outdated message with tampered slot. So, the attacker could waste both extra messages with slot tampering (the initial one for free, and the second one by tampering a sig for an unknown orphaned root) and then the correct message may still not propagate (as it does not pertain to the local head).

Another interesting case is checkpoint sync, where the old data may be locally unavailable even if canonical; but that's at least not affecting a large count of validators.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. There seem to be edge cases in whatever solution, but the proposed solution is a strict improvement regardless. So if we can’t have validator to sign an envelope to prevent slot tampering then verifying a block against the local head is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I mean, we can fix it in a future fork, but it still needs to be fixed / documented for current mainnet forks regardless, and it's probably good enough with one of the proposed fixes as it sufficiently disincentivizes the slot tampering attack, at least until sync committees are removed with beam anyway.

Copy link
Collaborator

@mkalinin mkalinin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! But I’d like to see approves from other CL devs before merging

@@ -161,7 +163,9 @@ The following validations MUST pass before forwarding the `sync_committee_messag
- _[IGNORE]_ The message's slot is for the current slot (with a `MAXIMUM_GOSSIP_CLOCK_DISPARITY` allowance), i.e. `sync_committee_message.slot == current_slot`.
- _[REJECT]_ The `subnet_id` is valid for the given validator, i.e. `subnet_id in compute_subnets_for_sync_committee(state, sync_committee_message.validator_index)`.
Note this validation implies the validator is part of the broader current sync committee along with the correct subcommittee.
- _[IGNORE]_ There has been no other valid sync committee message for the declared `slot` for the validator referenced by `sync_committee_message.validator_index`
- _[IGNORE]_ The block being signed (`sync_committee_message.beacon_block_root`) has been seen (via both gossip and non-gossip sources) (a client MAY queue sync committee messages for processing once block is retrieved).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably pedantic, but via both gossip and non-gossip i think more accurately would be reflected as via gossip or non-gossip sources
I can't imagine a case where we'd ever want fetching every block that we receive over gossip also over RPC which this currently kind of implies...
This is also repeated above, I think we just change it...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not new wording from this PR, but instead consistent with existing wording from everywhere else. The semantics here are meant to be identical to the existing one, and wording changes should probably be synchronized across all mentions.

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, i'll raise a separate PR for that, I still think it's misleading.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed the other instances in #4049

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, have synchronized the wording to match the one from dev.

@@ -140,6 +140,8 @@ def get_sync_subcommittee_pubkeys(state: BeaconState, subcommittee_index: uint64
- _[REJECT]_ `contribution_and_proof.selection_proof` selects the validator as an aggregator for the slot -- i.e. `is_sync_committee_aggregator(contribution_and_proof.selection_proof)` returns `True`.
- _[REJECT]_ The aggregator's validator index is in the declared subcommittee of the current sync committee --
i.e. `state.validators[contribution_and_proof.aggregator_index].pubkey in get_sync_subcommittee_pubkeys(state, contribution.subcommittee_index)`.
- _[IGNORE]_ The block being signed (`contribution_and_proof.contribution.beacon_block_root`) has been seen (via both gossip and non-gossip sources) (a client MAY queue sync committee contributions for processing once block is received)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about adding this in as a condition. It's not uncommon to see a fair bit of spread around the time that a block is received across different beacon nodes on the same server, let alone across multiple servers. Generating and signing sync committee messages is a pretty quick process, so if a beacon node discards these messages because it has yet to see the relevant block it could well be failing to hold on to a correct message.

How about if the beacon node will only ignore the message only if it has seen a block for the current slot? That gives protection against this common situation. If there is a concern about flooding the network with bogus messages by slower nodes then limit it to a single unknown beacon block root per beacon node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same condition as for attestations, aggregates, and EIP-7732 payloads / attestations.

In practice, sync messages and attestations are sent around ~4s into the slot, and sync contributions and aggregates are sent around ~8s into the slot.

If the condition is a problem here, it is also a problem for attestations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is not the specific time that sync messages show up, it's that there can be a fair spread of time over which some nodes have the current slot's block and others do not.

From my understanding, at least some of the beacon nodes queue attestations when they do not have the block for the current slot rather than discard them (not sure if they forward them or not, that's another question). But having nodes ignore sync committee messages for the current slot when the node doesn't yet have the current slot's block doesn't seem to be an optimal solution to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is only that the processing should be done consistently for attestations as well as sync messages. If this is a problem for sync messages, it is also a problem for attestations.

For attestations, the current spec wording has been in place since phase0. Implementations for example ensure that attestations are delayed a bit to ensure block propagation; and they use unknown roots as a trigger to req/resp the unknown block. Sync messages follow the same code path, as in, are practically sent with a delay after processing the block, or at 4s into the slot. It could be that the wording for attestations also does not properly reflect the reality; but the PR here solely ensures consistent handling of the two messages (attestations / sync messages) that sign over the head, it does not introduce new problems that may or may not be present with the existing logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants