Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

delayed ipam consensus in partially connected topology with non-ipam nodes #1118

Closed
rade opened this issue Jul 11, 2015 · 0 comments
Closed
Assignees
Milestone

Comments

@rade
Copy link
Member

rade commented Jul 11, 2015

I observe this when running the test I added in #1117.

The setup is host1 <-> host2(no-ipam) <-> host3, established like this:

host1:~$ weave launch-router --no-discovery
host2:~$ weave launch-router --no-discovery --ip-alloc-range="" host1
host3:~$ weave launch-router --no-discovery host2

And then we start a container requiring ipam on host3:

host3:~$ weave run -ti gliderlabs/alpine /bin/sh

This takes ~30 seconds to complete.

Enabling debug logging on host3 shows

DEBU: 2015/07/11 15:13:43.544408 [allocator 46:db:ad:5c:8c:89] Paxos proposing
DEBU: 2015/07/11 15:13:48.543790 [allocator 46:db:ad:5c:8c:89] Paxos proposing
DEBU: 2015/07/11 15:13:53.543764 [allocator 46:db:ad:5c:8c:89] Paxos proposing
DEBU: 2015/07/11 15:13:58.543680 [allocator 46:db:ad:5c:8c:89] Paxos proposing
DEBU: 2015/07/11 15:14:02.973963 [allocator 46:db:ad:5c:8c:89]: Allocator.OnGossip: 567 bytes
DEBU: 2015/07/11 15:14:02.977816 [allocator 46:db:ad:5c:8c:89]: Decided to ask peer f6:0a:27:c5:a9:98 for space in range [10.32.0.1-10.47.255.255)
DEBU: 2015/07/11 15:14:02.978697 [allocator 46:db:ad:5c:8c:89]: OnGossipUnicast from f6:0a:27:c5:a9:98 :  607 bytes
DEBU: 2015/07/11 15:14:02.979114 [allocator 46:db:ad:5c:8c:89]: Allocated 10.40.0.0 for d60e20ae5373d901af9a5995102c0a0ca3827cc68d5751b42e0f0bd8c62c0dac in [10.32.0.1-10.47.255.255)

So it looks like we only establish consensus when the periodic ipam gossip takes place.

Note that the commands as shown above doesn't actually reproduce the problem for me. Instead I have to run the full test from #1117, which first launches the three routers with normal discovery, starts two non-ipam containers (on host1 and host3), and then stops all routers. I reckon the difference is probably just down to timing and possible PRNG seeding.

Thinking about it and looking a the ipam paxos code, I believe what is happening here is that due to the partially connected topology with a non-ipam node in the middle, when peer3 starts it does not receive any IPAM gossip, since it connects to peer2 which doesn't run IPAM. And the paxos code on peer1 only broadcasts gossip in some very narrowly defined circumstances, which probably do not hold here. In particular, peer1 has quorum of one so can just create the ring, at which point it will no longer broadcast the ring state when receiving a paxos message (i.e. from peer3). Hence peer3 only finds out about the ring when the period gossip on peer3 takes place.

Perhaps the conditions under which ipam paxos broadcasts the ring need to be relaxed a bit.

@rade rade modified the milestone: current Jul 13, 2015
@bboreham bboreham self-assigned this Jul 15, 2015
rade added a commit that referenced this issue Jul 15, 2015
Respond to paxos messages when we already have a ring

Fixes #1118.
@rade rade modified the milestones: current, 1.1.0 Jul 21, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants