Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gossipsub v1.1: introduce message signing policy #294

Merged
merged 9 commits into from
Sep 30, 2020
52 changes: 42 additions & 10 deletions pubsub/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ and spec status.
- [The RPC](#the-rpc)
- [The Message](#the-message)
- [Message Signing](#message-signing)
- [Message Identification](#message-identification)
- [The Topic Descriptor](#the-topic-descriptor)
- [AuthOpts](#authopts)
- [AuthMode 'NONE'](#authmode-none)
Expand Down Expand Up @@ -112,6 +113,9 @@ message Message {
}
```

The `optional` fields may be omitted, depending on the
[signature policy](#message-signing) and [message ID function](#message-identification)

The `from` field denotes the author of the message, note that this is not
necessarily the peer who sent the RPC this message is contained in. This is
done to allow content to be routed through a swarm of pubsubbing peers.
Expand All @@ -123,14 +127,7 @@ The `seqno` field is a 64-bit big-endian uint that is a linearly increasing
number that is unique among messages originating from each given peer. No two
messages on a pubsub topic from the same peer should have the same `seqno`
value, however messages from different peers may have the same sequence number,
so this number alone cannot be used to address messages. Notably the
'timecache' in use by the go implementation contains a `message_id`,
which is constructed from the concatenation of the `seqno` and the `from`
fields. This `message_id` is then unique among messages. It was also proposed
in [#116](https://github.com/libp2p/specs/issues/116) to use a `message_hash`,
however, it was noted: "a potential caveat with using hashes instead of seqnos:
the peer won't be able to send identical messages (e.g. keepalives) within the
timecache interval, as they will get rejected as duplicates."
so this number alone cannot be used to address messages by origin-stamping.

The `topicIDs` field specifies a set of topics that this message is being
published to.
Expand All @@ -149,17 +146,52 @@ economics (see e.g.
and
[here](https://ethresear.ch/t/improving-the-ux-of-rent-with-a-sleeping-waking-mechanism/1480)).

## Message Identification

To uniquely identify a message in a set of topics (for de-duplication, tracking, scoring and other purposes), a `message_id` is calculated based on the message.
How the calculated happens can be configured on the application layer by supplying a function `message_id_fn`, such that `message_id_fn(*Message) => message_id`.

> [[ Implementation note ]]: At the time of writing this section, go-libp2p-pubsub (reference implementation of this spec) only allows configuring a single top-level `message_id_fn`. This function may, however, vary its behaviour based on the topic (contained inside its `*Message`) argument. Thus, it's feasible to implement a per-topic policy using branch selection control flow logic. go-libp2p-pubsub plans to push down the configuration of the `message_id_fn` to the topic level. Other implementations are encouraged to do the same.

The message ID calculation approach generally fits in two flavors:
- **origin-stamped** messaging: the combination of the `seqno` and `from` fields
uniquely identifies a message based on the *author*.
- **content-addressed** messaging: a message ID derived from the `data` field
uniquely identifies a message based on the *data*.

**The default `message_id_fn` is origin-stamped,** and defined as the string concatenation of `from` and `seqno`.

If fabricated collisions are not a concern, or difficult enough within the window the message is relevant in,
a `message_id` based on a short digest of inputs may benefit performance. Whichever the choice, it is crucial that **all peers** participating in a topic implement the same message ID calculation logic, or the topic may function suboptimally.

Note that different specialized pubsub components, such as the 'timecache' used in the Go implementation, scoring functions or circuit-breakers
may use the `message_id` to key and track messages.

It was also proposed in [#116](https://github.com/libp2p/specs/issues/116)
to use a `message_hash`, however, it was noted:
> a potential caveat with using hashes instead of seqnos:
the peer won't be able to send identical messages (e.g. keepalives) within the
timecache interval, as they will get treated as duplicates.

Some applications may not need keepalives, or choose to implement something more specific than a message hash. In those cases where duplicate payloads are not desirable, a `content-based` message ID function may be more appropriate.

## Message Signing

Messages can be optionally signed, and it is up to the peer whether to accept and forward
unsigned messages.
The default choice of origin-stamped messaging, the receiver should enforce signatures strictly (`StrictSign`).
When the receiver expects unsigned content-stamped messages, and thus does not expect
the `from`, `seqno`, `signature`, or `key` fields, it may reject the messages (`StrictNoSign`).

This optionality is configurable with the signature policy options starting from gossipsub v1.1.

For signing purposes, the `signature` and `key` fields are used:
- The `signature` field contains the signature.
- The `key` field contains the signing key when it cannot be inlined in the source peer ID.
- The `key` field contains the signing key when it cannot be inlined in the source peer ID (`from`).
When present, it must match the peer ID.

The signature is computed over the marshalled message protobuf _excluding_ the key field.
The signature is computed over the marshalled message protobuf _excluding_ the `signature` field itself.
This includes any fields that are not recognized, but still included in the marshalled data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is problematic because protobuf is non-deterministic in its marshalling - the way to write this specification would be to remove mentions of protobuf and instead explicitly outline the marshalling in terms of key-value pairs and field types - it would have to be done carefully such that the result is readable by a compliant protobuf parser - it might also happen by chance that some protobuf encoders produce a valid byte stream, but with unrecognised fields in particular, the likelihood is low.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's hoist the protobuf deterministic serialisation discussion to a new issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been recurrent enough that it needs to be a first-class discussion with a solid outcome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should specify canonical encoding here to remove all ambiguity.

The protobuf blob is prefixed by the string `libp2p-pubsub:` before signing.

When signature validation fails for a signed message, the implementation must
Expand Down
46 changes: 46 additions & 0 deletions pubsub/gossipsub/gossipsub-v1.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ See the [lifecycle document][lifecycle-spec] for context about maturity level an
- [Explicit Peering Agreements](#explicit-peering-agreements)
- [PRUNE Backoff and Peer Exchange](#prune-backoff-and-peer-exchange)
- [Protobuf](#protobuf)
- [Signature Policy](#signature-policy)
- [Signature Policy Options](#signature-policy-options)
- [Flood Publishing](#flood-publishing)
- [Adaptive Gossip Dissemination](#adaptive-gossip-dissemination)
- [Outbound Mesh Quotas](#outbound-mesh-quotas)
Expand Down Expand Up @@ -134,6 +136,50 @@ message PeerInfo {
}
```

### Signature Policy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that this should be in the gossipsub spec -- it is a generic pubsub policy that applies to all routing algorithms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, will move it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was looking for a place to version it, but since it's not breaking anything necessarily, and applies to pubsub, moving it to the pubsub doc sounds good.


The usage of the `signature`, `key`, `from`, and `seqno` fields in `Message` is now configurable per topic, in the manners specified in this section.
> [[ Implementation note ]]: At the time of writing this section, go-libp2p-pubsub (reference implementation of this spec) allows for configuring the signature policy at a global pubsub instance level. This needs to be pushed down to topic-level configuration. Other implementations are encouraged to support topic-level configuration, as the spec mandates.

In the default origin-stamped messaging, the fields need to be strictly enforced:
the `seqno` and `from` fields form the `message_id`, and should be verified to avoid `message_id` collisions.

In content-stamped messaging, the fields may negatively affect privacy:
revealing the relationship between `data` and `from`/`seqno`.
Comment on lines +144 to +148
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the goal of this is. This might be better moved to after "Signature Policy Options", and put under a "Usage indications/notes" section.

I would reword to something like this:

  • Origin-stamped messaging: choose StrictSign, and use in combination with the default message_id_fn.
  • Content-addressed messaging: enable StrictNoSign, with a custom hash-based message_id_fn. Any signatures would now be part of the payload, and would need to be validated through a custom message validator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to give some context/intuition why the fields are handled like they are. It could be worded directly as message id approach -> signing policy directions, like you suggest. Either option works for us.


#### Signature Policy Options

In gossipsub v1.1, these fields are strictly present and verified, or completely omitted altogether:
- `StrictSign`:
- On the producing side:
- Build messages with the `signature`, `key` (`from` may be enough for certain inlineable public key types), `from` and `seqno` fields.
- On the consuming side:
- Enforce the fields to be present, reject otherwise.
- Propagate only if the fields are valid and signature can be verified, reject otherwise.
- `StrictNoSign`:
- On the producing side:
- Build messages without the `signature`, `key`, `from` and `seqno` fields.
- The corresponding protobuf key-value pairs are absent from the marshalled message, not just empty.
- On the consuming side:
- Enforce the fields to be absent, reject otherwise.
- Propagate only if the fields are absent, reject otherwise.
- A `message_id` function will not be able to use the above fields, and should instead rely on the `data` field. A commonplace strategy is to calculate a hash.

In gossipsub v1.0, a legacy "lax" signing policy could be configured, to only verify signatures when present. For security reasons, this is strategy is discarded in subsequent versions, but MAY still be supported for backwards-compatibility. If so, its use should be discouraged through prominent deprecation warnings. These strategies will be entirely dropped in the future.
- `LaxSign`: *this was never an original gossipsub 1.0 option, but it's defined here for completeness, and considered insecure*. Always sign, and verify incoming signatures, and but accept unsigned messages.
- On the producing side:
- Build messages with the `signature`, `key` (`from` may be enough), `from` and `seqno` fields.
- On the consuming side:
- `signature` may be absent, and not verified.
- Verify `signature`, iff the `signature` is present, then reject if `signature` is invalid.
- `LaxNoSign`: *Previous default for no-verification*. Do not sign nor origin-stamp, but verify incoming signatures, and accept unsigned messages.
- On the producing side:
- Build messages without the `signature`, `key`, `from` and `seqno` fields.
- On the consuming side:
- Accept and propagate messages with above fields.
- Verify `signature`, iff the `signature` is present, then reject if `signature` is invalid.


### Flood Publishing

In gossipsub v1.0, peers publish new messages to the members of their mesh if they are subscribed to
Expand Down