delayed ipam consensus in partially connected topology with non-ipam nodes #1118

rade · 2015-07-11T16:11:36Z

I observe this when running the test I added in #1117.

The setup is host1 <-> host2(no-ipam) <-> host3, established like this:

host1:~$ weave launch-router --no-discovery
host2:~$ weave launch-router --no-discovery --ip-alloc-range="" host1
host3:~$ weave launch-router --no-discovery host2

And then we start a container requiring ipam on host3:

host3:~$ weave run -ti gliderlabs/alpine /bin/sh

This takes ~30 seconds to complete.

Enabling debug logging on host3 shows

DEBU: 2015/07/11 15:13:43.544408 [allocator 46:db:ad:5c:8c:89] Paxos proposing
DEBU: 2015/07/11 15:13:48.543790 [allocator 46:db:ad:5c:8c:89] Paxos proposing
DEBU: 2015/07/11 15:13:53.543764 [allocator 46:db:ad:5c:8c:89] Paxos proposing
DEBU: 2015/07/11 15:13:58.543680 [allocator 46:db:ad:5c:8c:89] Paxos proposing
DEBU: 2015/07/11 15:14:02.973963 [allocator 46:db:ad:5c:8c:89]: Allocator.OnGossip: 567 bytes
DEBU: 2015/07/11 15:14:02.977816 [allocator 46:db:ad:5c:8c:89]: Decided to ask peer f6:0a:27:c5:a9:98 for space in range [10.32.0.1-10.47.255.255)
DEBU: 2015/07/11 15:14:02.978697 [allocator 46:db:ad:5c:8c:89]: OnGossipUnicast from f6:0a:27:c5:a9:98 :  607 bytes
DEBU: 2015/07/11 15:14:02.979114 [allocator 46:db:ad:5c:8c:89]: Allocated 10.40.0.0 for d60e20ae5373d901af9a5995102c0a0ca3827cc68d5751b42e0f0bd8c62c0dac in [10.32.0.1-10.47.255.255)

So it looks like we only establish consensus when the periodic ipam gossip takes place.

Note that the commands as shown above doesn't actually reproduce the problem for me. Instead I have to run the full test from #1117, which first launches the three routers with normal discovery, starts two non-ipam containers (on host1 and host3), and then stops all routers. I reckon the difference is probably just down to timing and possible PRNG seeding.

Thinking about it and looking a the ipam paxos code, I believe what is happening here is that due to the partially connected topology with a non-ipam node in the middle, when peer3 starts it does not receive any IPAM gossip, since it connects to peer2 which doesn't run IPAM. And the paxos code on peer1 only broadcasts gossip in some very narrowly defined circumstances, which probably do not hold here. In particular, peer1 has quorum of one so can just create the ring, at which point it will no longer broadcast the ring state when receiving a paxos message (i.e. from peer3). Hence peer3 only finds out about the ring when the period gossip on peer3 takes place.

Perhaps the conditions under which ipam paxos broadcasts the ring need to be relaxed a bit.

The text was updated successfully, but these errors were encountered:

Respond to paxos messages when we already have a ring Fixes #1118.

rade added bug [component/ipam] labels Jul 11, 2015

rade modified the milestone: current Jul 13, 2015

bboreham self-assigned this Jul 15, 2015

bboreham mentioned this issue Jul 15, 2015

Respond to paxos messages when we already have a ring #1149

Merged

rade closed this as completed in #1149 Jul 15, 2015

rade added a commit that referenced this issue Jul 15, 2015

Merge pull request #1149 from weaveworks/1118-respond-late-paxos

52d3bcf

Respond to paxos messages when we already have a ring Fixes #1118.

rade modified the milestones: current, 1.1.0 Jul 21, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

delayed ipam consensus in partially connected topology with non-ipam nodes #1118

delayed ipam consensus in partially connected topology with non-ipam nodes #1118

rade commented Jul 11, 2015

delayed ipam consensus in partially connected topology with non-ipam nodes #1118

delayed ipam consensus in partially connected topology with non-ipam nodes #1118

Comments

rade commented Jul 11, 2015