Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gossipsub v1.1: introduce message signing policy #294

Merged
merged 9 commits into from
Sep 30, 2020
51 changes: 41 additions & 10 deletions pubsub/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ and spec status.
- [The RPC](#the-rpc)
- [The Message](#the-message)
- [Message Signing](#message-signing)
- [Message Identification](#message-identification)
- [The Topic Descriptor](#the-topic-descriptor)
- [AuthOpts](#authopts)
- [AuthMode 'NONE'](#authmode-none)
Expand Down Expand Up @@ -112,6 +113,9 @@ message Message {
}
```

The `optional` fields may be omitted, depending on the
[signature policy](#message-signing) and [message ID function](#message-identification)

The `from` field denotes the author of the message, note that this is not
necessarily the peer who sent the RPC this message is contained in. This is
done to allow content to be routed through a swarm of pubsubbing peers.
Expand All @@ -123,14 +127,7 @@ The `seqno` field is a 64-bit big-endian uint that is a linearly increasing
number that is unique among messages originating from each given peer. No two
messages on a pubsub topic from the same peer should have the same `seqno`
value, however messages from different peers may have the same sequence number,
so this number alone cannot be used to address messages. Notably the
'timecache' in use by the go implementation contains a `message_id`,
which is constructed from the concatenation of the `seqno` and the `from`
fields. This `message_id` is then unique among messages. It was also proposed
in [#116](https://github.com/libp2p/specs/issues/116) to use a `message_hash`,
however, it was noted: "a potential caveat with using hashes instead of seqnos:
the peer won't be able to send identical messages (e.g. keepalives) within the
timecache interval, as they will get rejected as duplicates."
so this number alone cannot be used to address messages by origin-stamping.

The `topicIDs` field specifies a set of topics that this message is being
published to.
Expand All @@ -149,17 +146,51 @@ economics (see e.g.
and
[here](https://ethresear.ch/t/improving-the-ux-of-rent-with-a-sleeping-waking-mechanism/1480)).

## Message Identification

To uniquely identify a message in a set of topics, a `message_id` is computed based on the message.
This can be configured on the application layer, as `message_id_fn(*Message) => message_id`.
A `message_id_fn` may conditionally call different `message_id_fn` implementations per topic (or group thereof).
raulk marked this conversation as resolved.
Show resolved Hide resolved

The message ID approach generally fits in two flavors:
raulk marked this conversation as resolved.
Show resolved Hide resolved
- **origin-stamped** messaging: the combination of the `seqno` and `from` fields
uniquely identifies a message based on the *author*.
- **content-stamped** messaging: a message ID derived from the `data` field
raulk marked this conversation as resolved.
Show resolved Hide resolved
uniquely identifies a message based on the *data*.

The default `message_id_fn` is origin-stamped, and defined as the string concatenation of `from` and `seqno`.
raulk marked this conversation as resolved.
Show resolved Hide resolved

If fabricated collisions are not a concern, or difficult enough within the window the message is relevant in,
a `message_id` based on a short digest of inputs may benefit performance.
raulk marked this conversation as resolved.
Show resolved Hide resolved

Note that different specialized pubsub components, such as the 'timecache' used in the Go implementation,
may use the `message_id` to key messages.
raulk marked this conversation as resolved.
Show resolved Hide resolved

It was also proposed in [#116](https://github.com/libp2p/specs/issues/116)
to use a `message_hash`, however, it was noted:
> a potential caveat with using hashes instead of seqnos:
the peer won't be able to send identical messages (e.g. keepalives) within the
timecache interval, as they will get rejected as duplicates.
raulk marked this conversation as resolved.
Show resolved Hide resolved
raulk marked this conversation as resolved.
Show resolved Hide resolved

Some applications may not need keepalives, or choose to implement something more specific than a message hash.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Some applications may not need keepalives, or choose to implement something more specific than a message hash.
Some applications may not need keepalives, or choose to implement something more specific than a message hash. In those cases where duplicate payloads are not desirable, a `content-based` message ID function may be more appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean content-addressed here, to keep it consistent with other changes.


## Message Signing

Messages can be optionally signed, and it is up to the peer whether to accept and forward
unsigned messages.
The default choice of origin-stamped messaging, the receiver should enforce signatures strictly (`StrictSign`).
When the receiver expects unsigned content-stamped messages, and thus does not expect
the `from`, `seqno`, `signature`, or `key` fields, it may reject the messages (`StrictNoSign`).

This optionality is configurable with the signature policy options starting from gossipsub v1.1.

For signing purposes, the `signature` and `key` fields are used:
- The `signature` field contains the signature.
- The `key` field contains the signing key when it cannot be inlined in the source peer ID.
- The `key` field contains the signing key when it cannot be inlined in the source peer ID (`from`).
When present, it must match the peer ID.

The signature is computed over the marshalled message protobuf _excluding_ the key field.
The signature is computed over the marshalled message protobuf _excluding_ the `signature` field itself.
This includes any fields that are not recognized, but still included in the marshalled data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is problematic because protobuf is non-deterministic in its marshalling - the way to write this specification would be to remove mentions of protobuf and instead explicitly outline the marshalling in terms of key-value pairs and field types - it would have to be done carefully such that the result is readable by a compliant protobuf parser - it might also happen by chance that some protobuf encoders produce a valid byte stream, but with unrecognised fields in particular, the likelihood is low.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's hoist the protobuf deterministic serialisation discussion to a new issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been recurrent enough that it needs to be a first-class discussion with a solid outcome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should specify canonical encoding here to remove all ambiguity.

The protobuf blob is prefixed by the string `libp2p-pubsub:` before signing.

When signature validation fails for a signed message, the implementation must
Expand Down
46 changes: 46 additions & 0 deletions pubsub/gossipsub/gossipsub-v1.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ See the [lifecycle document][lifecycle-spec] for context about maturity level an
- [Explicit Peering Agreements](#explicit-peering-agreements)
- [PRUNE Backoff and Peer Exchange](#prune-backoff-and-peer-exchange)
- [Protobuf](#protobuf)
- [Signature Policy](#signature-policy)
- [Signature Policy Options](#signature-policy-options)
- [Flood Publishing](#flood-publishing)
- [Adaptive Gossip Dissemination](#adaptive-gossip-dissemination)
- [Outbound Mesh Quotas](#outbound-mesh-quotas)
Expand Down Expand Up @@ -134,6 +136,50 @@ message PeerInfo {
}
```

### Signature Policy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that this should be in the gossipsub spec -- it is a generic pubsub policy that applies to all routing algorithms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, will move it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was looking for a place to version it, but since it's not breaking anything necessarily, and applies to pubsub, moving it to the pubsub doc sounds good.


The usage of the `signature`, `key`, `from`, and `seqno` fields in `Message` is now configurable.
raulk marked this conversation as resolved.
Show resolved Hide resolved
Initially this could be configured globally, however, because the policies are mutually incompatible, configuration on a per-topic basis will facilitate mixed protocols better.
raulk marked this conversation as resolved.
Show resolved Hide resolved

In the default origin-stamped messaging, the fields need to be strictly enforced:
the `seqno` and `from` fields form the `message_id`, and should be verified to avoid `message_id` collisions.

In content-stamped messaging, the fields may negatively affect privacy:
revealing the relationship between `data` and `from`/`seqno`.
Comment on lines +144 to +148
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the goal of this is. This might be better moved to after "Signature Policy Options", and put under a "Usage indications/notes" section.

I would reword to something like this:

  • Origin-stamped messaging: choose StrictSign, and use in combination with the default message_id_fn.
  • Content-addressed messaging: enable StrictNoSign, with a custom hash-based message_id_fn. Any signatures would now be part of the payload, and would need to be validated through a custom message validator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to give some context/intuition why the fields are handled like they are. It could be worded directly as message id approach -> signing policy directions, like you suggest. Either option works for us.


#### Signature Policy Options

In gossipsub v1.1, these fields are strictly present and verified, or completely omitted altogether:
- `StrictSign`:
- On the producing side:
- Build messages with the `signature`, `key` (`from` may be enough), `from` and `seqno` fields.
raulk marked this conversation as resolved.
Show resolved Hide resolved
- On the consuming side:
- Enforce the fields to be present, reject otherwise.
- Propagate only if the fields are valid and signature can be verified, reject otherwise.
- `StrictNoSign`:
- On the producing side:
- Build messages without the `signature`, `key`, `from` and `seqno` fields.
- The corresponding protobuf key-value pairs are absent from the marshalled message, not just empty.
- On the consuming side:
- Enforce the fields to be absent, reject otherwise.
- Propagate only if the fields are absent, reject otherwise.
- A `message_id` function will not be able to use the above fields, and may instead rely on the `data` field.
raulk marked this conversation as resolved.
Show resolved Hide resolved

In gossipsub v1.0, a legacy "lax" signing policy could be configured, to not verify signatures except when present:
raulk marked this conversation as resolved.
Show resolved Hide resolved
- `LaxSign`: *Defined for completeness, insecure*. Also known as authoring but not verifying.
raulk marked this conversation as resolved.
Show resolved Hide resolved
- On the producing side:
- Build messages with the `signature`, `key` (`from` may be enough), `from` and `seqno` fields.
- On the consuming side:
- `signature` may be absent, and not verified.
- Verify `signature`, iff the `signature` is present, then reject if `signature` is invalid.
- `LaxNoSign`: *Previous default for no-verification*
raulk marked this conversation as resolved.
Show resolved Hide resolved
- On the producing side:
- Build messages without the `signature`, `key`, `from` and `seqno` fields.
- On the consuming side:
- Accept and propagate messages with above fields.
- Verify `signature`, iff the `signature` is present, then reject if `signature` is invalid.


### Flood Publishing

In gossipsub v1.0, peers publish new messages to the members of their mesh if they are subscribed to
Expand Down