-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gossipsub extension for Epidemic Meshes (v1.2.0) #413
base: master
Are you sure you want to change the base?
Conversation
Hey, nice idea with CHOKE/UNCHOKE! Duplicates are a real pain and will become more of so with larger degrees and/or larger messages. Let me use this thread to say that I was playing with two different ideas to reduce duplicates. Let me know if you think they make sense and can be somehow be combined with yours.
|
It should be added that this is based on a sync session we had with Andrian, so it is not coming out of the blue. |
Hi, nice addition! A few remarks: If it's backward compatible, why advertise another Protocol Id? I don't think it's super necessary to know wether a peer handles the choking or not, so we could just send it optimistically, no? I'm not super convinced about the unchoking mechanism, if we have to rely on IHAVEs it might already be "too late". Maybe we could do something similar to bittorrent, with period based choking, or optimistic unchoking? Regarding simulation, it seems easy enough to try this algorithm on real networks, by "virtually choking" a peer, and compute the bandwidth gain / latency loss. It would also help us tune the unchoking mechanism Last word regarding the spec format, this is starting to get a bit messy imo: there is gossipsub 1.1 which references gossipsub 1.0, which references pubsub. To have the full spec, I will soon need to open 4 spec. Versionning is also suprising, if it's backward compatible, why call it 2.0.0? It's also not consistent with v1.1 |
we need the protocol id change to signify feature support. |
Pethaps it could be called v1.2.0 however. |
Hey all. Thanks for the responses. Sorry wasn't expecting it to get much attention. This is based on a call with @vyzo and this PR was mainly just a format to share a draft of the ideas @vyzo and I discussed that could be easily commented on. The format, structure and versioning etc, we can change, I dont imagine the PR as described here will resemble the final result. I'm indifferent about the versioning, but agree the 2.0.0 that I suggested isn't consistent.
If I understand this, the application chooses (maybe dynamically) which messages nodes propagate on the mesh? It sounds like depending on the message you're segregating the mesh overlay into smaller overlays? I don't directly see the benefit to this vs just making the mesh size smaller? I may have misunderstood however.
Interesting idea. As you've pointed out the privacy issues in this model would not work for my particular use-case but it may work for others.
Good point. I'm also unsure how this will behave if we choke too aggressively. Ideally I was trying to hit a set of parameters that make it not so network dependent, such that standard default values could benefit most networks. There's a number of levers we can use to adjust the speed of unchoking based on gossip still however. For example, we could have very fast heartbeats (lots of gossip), and maybe have a different time scale at which we decide on unchoking based on the gossip. I put the thresholds in there as a placeholder for simulations, but potentially lowering those could also behave like optimistic unchoking. Also happy to put explicit optimistic unchoking in if we think its required. I was hoping simulations could give us more insights on things lacking with this proposal. |
pubsub/gossipsub/gossipsub-v2.0.md
Outdated
|
||
The proposed extensions are backwards-compatible and aim to enhance the efficiency (minimize amplification/duplicates and decrease message latency) of the gossip mesh networks by dynamically adjusting the messages sent by mesh peers based on a local view of message duplication and latency. | ||
|
||
In more specific terms, two new control messages are introduced, `CHOKE` and `UNCHOKE`. When a Gossipsub router is receiving many duplicates on a particular mesh, it can send a `CHOKE` message to it's mesh peers that are sending duplicates slower than its fellow mesh peers. Upon receiving a `CHOKE` message, a peer is informed to no longer propagate mesh messages to the sender of the `CHOKE` message, rather lazily (in every heartbeat) send it's gossip. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make the conditions more explicit once we have more data.
Also, messages originating in the choked node will still have to be directly sent, we should probably mention this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
pubsub/gossipsub/gossipsub-v2.0.md
Outdated
| `D_max_add` | The maximum number of peers to add into the mesh (from fanout) if they are performing well per `choke_heartbeat_interval`. | 1 | | ||
| `choke_duplicates_threshold` | The minimum number of duplicates as a percentage of received messages a peer must send before being eligible of being `CHOKE`'d. | 60 | | ||
| `choke_churn` | The maximum number of peers that can be `CHOKE`'d or `UNCHOKE`'d in any `choke_heartbeat_interval`. | 2 | | ||
|` unchoke_threshold` | Determines how aggressively we unchoke peers. The percentage of messages that we receive in the `choke_heartbeat_interval` that were received by gossip from a choked peer. | 50 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldnt this be higher than choke_threshold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont have a feel for this.
They are measuring different things. I think the duplicates we get from a specific node is fairly dependent on the topology of the network. A triangle-like network would see more duplicates from specific peers (I think).
Once choked, there is a race condition between how fast we get messages from the mesh (which is like a single round trip) vs the request/response from an IWANT (which I think would be more correlated with latency of the node). I don't see a direct relationship between these thresholds or a feel for which should be higher than the other.
You probably have a better intuition than I do, is there something obvious I'm missing that should make this higher than choke_threshold
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, you are right, this is not clear what these values should be, lets look at some data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope you don’t mind if I add some considerations.
In general, and without choking, each message travels at least once on each mesh link. It would be exactly one if there would not be propagation delay, but there is, and it will become even bigger with large messages.
If node A receives a duplicate from B, that also means that:
- A has already received from someone else (otherwise this wouldn’t be a duplicate) and was queuing the message for sending to B
- B did receive from someone else and queued for sending to B before getting the above from A (otherwise there would be no duplicate)
In other words, this relation is symmetric. If A is receiving a duplicate from B, then B will also receive a duplicate from A. At least I think no implementations aborts the sending of queued messages once there is a message from the other side. They mostly can’t even do it because queuing is in-network. This might be an issue for the absolute percentage based choke_duplicates_threshold’, because the peers would race to choke each other. But you are speaking of “largest average latency” in the selection. What latency do you mean here?
As for the ‘unchoke_threshold’: There are I think two things to look at when evaluating who to unchoke:
- whether they would provide chunks we would otherwise miss
- whether they would improve our latency.
The difficulty is that IHAVE messages provide a largely delayed view compared to actual message reception. Maybe looking at the freshest messages only in IHAVEs could be an indicator for the second. For the first, one could look at the IHAVEs received, or the messages received as a consequence of IWANTs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cskiraly - Thanks for the extra considerations.
In regards to the sources of duplicates, I think its a little tricky and I'm not sure the relation is symmetric. I think there are three main reasons we see duplicates:
- Propagation delay (as you've mentioned) - Two nodes could send to each other because of propagation delay.
- Validation delay - Depending on the application, a node could receive a message, and take a while to validate it before propagating it in which time others could send it duplicates before it gets a chance to forward it.
- Topology - This one is the more tricky one to reason about imo, because there's lots of ways nodes can be connected and depending on the network the publisher can be random. But let me try come up with a simple example:
Consider 4 nodes:
A
| |
B C
|
D
A is connected to B and C who in turn are connected to D. Lets say A constantly publishes, it sends to B and C. B and C will then send to D. Lets say on average one path is faster, A->B->D. D will receive a message from B before C. Depending on propagation speed and validation time etc it will then send to C. If this timing is consistently faster than A-> C and C's validation time, then C will be constantly receiving duplicates from D (and should not respond by sending a message back, making this not symmetric).
The aim in this proposal is for C to choke D (in this example).
What latency do you mean here?
I was thinking that for all the duplicates we see, we measure the time it took for it to reach us measured relative to the first message sent to us. I assume this will create a somewhat normal distribution of timing amongst our peers. If a few peers are eligible for choking we preference them by the average time of messages they sent to us. i.e if most of the time one peer is sending us a duplicate 700ms later than the first message, when another is sending us duplicates 100ms later, we choke the one sending 700ms later on average.
The difficulty is that IHAVE messages provide a largely delayed view compared to actual message reception. Maybe looking at the freshest messages only in IHAVEs could be an indicator for the second. For the first, one could look at the IHAVEs received, or the messages received as a consequence of IWANTs
Yes. I agree. IHAVE's are very slow. They are stochastic also, in that we randomly select peers to send them to in the heartbeat. The speed is very much related to heartbeat timing and gossip_lazy parameters.
When we get choked a peer, the idea is to always send IHAVE messages to the peer that choked us in the heartbeat. That way, if the peer that choked us, has malicious or slow nodes, and its source of messages are coming from IWANT based on our IHAVE's, then they know this is an error and must unchoke us. The issue, as you've pointed out, is that if the gossip (IHAVE/IWANT) system is going to be fast enough to beat the mesh in a statistically meaningful way to know if we should unchoke.
The complications are that the speed of the gossip is very much parameter dependent. We could make the heartbeats very fast, like 10's of ms and I would imagine it would be fine, but I dont have a good feel if we can use second-level heartbeats to also achieve appropriate unchoking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cskiraly - Thanks for the extra considerations.
In regards to the sources of duplicates, I think its a little tricky and I'm not sure the relation is symmetric. I think there are three main reasons we see duplicates:
- Propagation delay (as you've mentioned) - Two nodes could send to each other because of propagation delay.
- Validation delay - Depending on the application, a node could receive a message, and take a while to validate it before propagating it in which time others could send it duplicates before it gets a chance to forward it.
- Topology - This one is the more tricky one to reason about imo, because there's lots of ways nodes can be connected and depending on the network the publisher can be random. But let me try come up with a simple example:
Right, I've forgot about validation delay. It is indeed there, but I had a reason to forget about it. We have added duplicate metrics in nim-libp2p and thus in Nimbus, and it shows that in that use case, the duplicates during validation are negligible compared to all duplicates. Here is an example.
I'm happy to provide more data on this.
Now this is just one use case of gossipsub, and in general we should consider the validation delay as you've rightly pointed out.
Consider 4 nodes:
A | | B C | D
A is connected to B and C who in turn are connected to D. Lets say A constantly publishes, it sends to B and C. B and C will then send to D. Lets say on average one path is faster, A->B->D. D will receive a message from B before C. Depending on propagation speed and validation time etc it will then send to C. If this timing is consistently faster than A-> C and C's validation time, then C will be constantly receiving duplicates from D (and should not respond by sending a message back, making this not symmetric).
I see, but this really depends on the assumption that validation time is long enough to create such a time window. And data as above tells me it is much smaller than network induced delays, so you mostly end up with symmetry.
The aim in this proposal is for C to choke D (in this example).
Right. I'm not sure how this symmetry plays out, just raised it as something that might modify the actual effect compared to what we expect.
What latency do you mean here?
I was thinking that for all the duplicates we see, we measure the time it took for it to reach us measured relative to the first message sent to us. I assume this will create a somewhat normal distribution of timing amongst our peers. If a few peers are eligible for choking we preference them by the average time of messages they sent to us. i.e if most of the time one peer is sending us a duplicate 700ms later than the first message, when another is sending us duplicates 100ms later, we choke the one sending 700ms later on average.
That's nice! I wasn't thinking of that delay distribution. Now say your 700ms peer is sending you first copies on 30% of chunks, and has long delay on 70%. The 100ms peer is sending you only duplicates, but it sends those with a 100ms average. Which one would you choke? I think I would go with the 100ms one. In effect I think we need another distribution, one that also adds with negative weight the cases when we receive first. This weight could be the time between the first and the 2nd receive. Or even that boosted, as 1st receive is very important compared to all the duplicates.
The difficulty is that IHAVE messages provide a largely delayed view compared to actual message reception. Maybe looking at the freshest messages only in IHAVEs could be an indicator for the second. For the first, one could look at the IHAVEs received, or the messages received as a consequence of IWANTs
Yes. I agree. IHAVE's are very slow. They are stochastic also, in that we randomly select peers to send them to in the heartbeat. The speed is very much related to heartbeat timing and gossip_lazy parameters. When we get choked a peer, the idea is to always send IHAVE messages to the peer that choked us in the heartbeat. That way, if the peer that choked us, has malicious or slow nodes, and its source of messages are coming from IWANT based on our IHAVE's, then they know this is an error and must unchoke us. The issue, as you've pointed out, is that if the gossip (IHAVE/IWANT) system is going to be fast enough to beat the mesh in a statistically meaningful way to know if we should unchoke. The complications are that the speed of the gossip is very much parameter dependent. We could make the heartbeats very fast, like 10's of ms and I would imagine it would be fine, but I dont have a good feel if we can use second-level heartbeats to also achieve appropriate unchoking.
The speed, overhead, and usefulness of gossip is also very much dependent on message size. With very small it is almost useless, with large messages it saves a lot of bandwidth. Having said that, we also have metrics live on IHAVE/IWANT hit ratios, that can help. Happy to share those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with all this.
I think the current plan is to build some generic functions that decide whether to choke/unchoke peers based on some message timing data. So we can play with some arbitrary distributions and decisions about choking/unchoking and see how they play out in real networks.
Once we have something concrete, I'll run some ideas past you. :)
OK, I think I wasn't clear on the idea. This would be just a local downlink filter, valid for only that specific link. Therefore, it is not splitting the mesh, just "specializes" some links to a subset of the traffic. You could have a mesh link on which you send all and asked to receive only id==1 mod 2, and another link where you were asked to send only id=0 mod 3. You might be right that you can look at this as having 6 different meshes (mod 6 in the above example), but I think you are amortizing large part of the maintenance cost while you can achieve some dynamic behaviour that allows going for larger D. This could be useful because it provides a more fine-grained control than pure choking/unchoking. It could also fight one of the main reasons for duplicates and thus choking: propagation latency due to bottlenecks and thus queues. I hope it is clear now. The filter is just the protocol primitive, how it is regulated would indeed need lots of work.
On this I've actually started simulations some time ago using the code from @vyzo. My changes are very rudimentary, but adds the hopcount and timestamps https://github.com/cskiraly/gerbil-simsub/commits/hopcount |
So there has been some progress in the self-appointed Episub Working Group (if you are maintaining a gossipsub implementation and you are interested in implementing episub, you are welcome to join us). The simsub simulator now supports preliminary episub, with a number of choking strategies. I have run a number of simulations, with the results here: https://gist.github.com/vyzo/77251c89950ef0d9ac422cfce8e5a09e Note that the simulation are 100 node ones, because my computer is too slow for larger ones and begins to lag; feel free to run your own sims and post results. |
An update: I have added a virtual time scheduler to simsub in vyzo/gerbil-simsub#7, which should allow it to scale simulations to large number of nodes bound only by available memory. |
Here are the result from 250 and 500 node runs:
Update: |
pubsub/gossipsub/gossipsub-v1.2.md
Outdated
Upon receiving a `CHOKE` message, the router MUST no longer forward messages to | ||
the peer that sent the `CHOKE` message, while it is still in the mesh. Instead | ||
it MUST always send an IHAVE message (provided it does not hit the IHAVE | ||
message limit) in the next gossipsub heartbeat to the peer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could be more aggressive, and send IHAVE (or a similar message) instantly?
This way, the worst delay induced by episub would be 1 RTT, instead of 700ms + 1 RTT (in the case of eth2)
It would also be easier to judge if a chocked peer is faster than unchocked ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it, we would even let the "topic score" handle chocking / unchoking:
- When you are fast, you will get "first deliveries" point, and have a big score
- So we choke peers with the lowest scores for this topic (
n
peers or peers with less thann
points) - If one start to catch up, he will get "first deliveries" points thanks to the IWANT replies
- Up until the point where he isn't amongst the slowest one, and gets unchocked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather not conflate scoring with this; scoring is a defense mechanism, while this is a bandwidth optimization mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I've tried to leave the scoring out, because its optional. I'd like 1.2 to work for those users who have opted out of scoring.
In regards to instantly sending the IHAVE, I'm not sure about the best approach.
I guess if we do it in the heartbeat we can group messages into a single IHAVE, rather than sending an IHAVE per message, but the downside is we can't identify peers that are sending messages faster than the mesh but within this timeframe.
Happy to go with either path here. Maybe we try simulate both and see how it goes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The basic overhead is quite easy to compute:
Here is a RPC message with an IHAVE:
1a0e0a0c0a04626262621204deadbeef
"deadbeef" is the message id, "26262626" is the topic
So we have 8 bytes of useful data, 8 bytes of headers & co
Even if we add the muxer headers etc, it's still <32 bytes overhead per message, which depending on the use case, might be a lot, or nothing :)
For instance, for eth2 block topic, that's nothing (at most 32 bytes every 12 seconds)
For an attestation topic, chocked peers will send:
- Currently (not chocked): attestation (233 bytes) + topic (~43 bytes) + headers (8) = 284 bytes
- Instant IHAVE: message id (20 bytes) + topic (~43 bytes) + headers (8) = 71 bytes = 75% reduction
- Batched IHAVE: message id (20 bytes) = 20 bytes = 93% reduction
(not counting topic & headers in the batched ihave, as they are enough attestation in one heartbeat that it doesn't matter)
So we loose 18% reduction on attestations, an about 0% on blocks
(sorry, I'm being eth2 centric here, but I think it highlights two very different use cases)
And of course, if we receive an IHAVE with an unknown block, we will request it, which will cost more bandwidth. That's seems harder to quantify, and even simulate, since it depends heavily on networking jitter, mesh topology, etc
Remains to see if these 18% are worth it :) I would say yes as it limit the latency cost quite drastically
Malicious nodes may then intentionally attempt to game various choking | ||
strategies in order to get choked by the router. | ||
|
||
Choked peers are inherently less valuable mesh peers than unchoked peers. As |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this needs some more thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah
} | ||
|
||
message ControlChoke { | ||
optional string topicID = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the field is optional
? What does the absence of this value mean?
We should target this to #560 |
Yep. I have been following these changes. Although I have a working rust version of this, it seems we should hold off until the other changes are added first which have better modelling done on them at the moment. |
This is a minimal extension aiming to minimize duplicates and latency on the mesh networks in gossipsub v1.1.0.
This is a working draft at the moment. Simulations and implementations will be added as we go.
All feedback welcome.