XEP-0198: Stream Management #144

arcusfelis · 2014-02-05T14:52:29Z

This can be proposed for GSOC2014: http://xmpp.org/extensions/xep-0198.html

IgorKarymov · 2014-04-24T07:45:15Z

Just to be sure, that you saw this pool request to ejabberd:
processone/ejabberd#166
So maybe it's good idea just porting it and write some good tests around it.

erszcz · 2014-05-14T11:53:24Z

Superseded by #195.

erszcz · 2014-05-14T12:11:37Z

@IgorKarymov: Thanks for pointing this out.

Having compared the implementations, I see both pros and cons on both sides.
@weiss takes an easier and, after I thought about it a bit more, better approach to generating the Stream Management ID.

OTOH, I think that a crucial demand towards Stream Management is to be the mechanism guaranteeing that no message is lost even on an unreliable network (think mobile devices losing connectivity, switching ESSIDs or from 3G to WiFi and vice versa) -- there's a lot of requests for such a functionality. Holger's approach to dropping messages from the outgoing message queue on queue overflow stands against the policy to guarantee no message loss.

Moreover, what does it really mean that the outgoing message queue overflows? It means, that the client had no chance to ack. Either it's too slow to ack (doubtful, and even if so, then the XEP's purpose is not to negotiate transmission rates) or the client's connection is broken, (s)he doesn't receive ack requests and hence can't ack. In such a case, I consider it much cleaner to just terminate the connection, don't assume that it will magically fix itself, and let the client resume the session or let offline storage persist all the unacked messages from the outgoing message queue, if the client decides not to resume.

We also have open source tests for all the functionality, see https://github.com/esl/ejabberd_tests.

weiss · 2014-05-14T14:27:09Z

Holger's approach to dropping messages from the outgoing message queue on queue overflow stands against the policy to guarantee no message loss.

That's right. We're talking about a situation that shouldn't happen™ under normal circumstances; but if it happens, I think you can only choose the least of multiple evils. In my view, the best option is actually to terminate the session and to bounce any queued stanzas (I had submitted a pull request for ejabberd already). The downside is that this might expose the client to DoS attempts: I could try to kill your session simply by firing stanzas at a rate your slow mobile connection cannot handle. This must probably be taken care of by rate limiting on the server side.

You might also want to read the recent discussion on the XMPP Standards list on this issue.

Moreover, what does it really mean that the outgoing message queue overflows? It means, that the client had no chance to ack.

Or that it refuses to do so due to a bug (I've seen that in practice) or due to malicious intent.

I consider it much cleaner to just terminate the connection, don't assume that it will magically fix itself, and let the client resume the session or let offline storage persist all the unacked messages from the outgoing message queue, if the client decides not to resume.

I didn't check your code yet, sorry: Do I understand it right that you close the connection without terminating the session? How do you handle new stanzas received until the resume takes place (or the session times out)?

And did I get it right that, if the session times out, you write the queued stanzas to offline storage instead of bouncing them? (Even if another resource is online, BTW?) Won't you have to take special actions to give the client a chance to handle those stanzas on reconnect without immediately filling up the queue again?

erszcz · 2014-05-14T15:00:56Z

We're talking about a situation that shouldn't happen™

Well, life's brutal ;) I mean we have multiple clients looking for a reliable solution of the aforementioned cases of mobile connections and it seems the problem is more frequent than desired.

Or that it refuses to do so due to a bug (I've seen that in practice) or due to malicious intent.

In case of malicious intent I think it's even better if we terminate early, though I haven't tried to come up with any real scenario. I'll take a look at the list you linked to.

Due to a bug? Point taken, this might be a problem case. There should still be a fallback option of turning stream management off, though, without recompiling the client.

I didn't check your code yet, sorry: Do I understand it right that you close the connection without terminating the session? How do you handle new stanzas received until the resume takes place (or the session times out)?

No, I was simply painting the picture in broad strokes. For the resumption period the c2s is still able to receive stanzas (i.e. the session is not removed from the ETS yet) and on resumption timeout the process message queue is flushed and all the stanzas from the outgoing buffer are bounced. Either they are routed to another resource or go to offline storage.

weiss · 2014-05-14T16:25:32Z

I mean we have multiple clients looking for a reliable solution of the aforementioned cases of mobile connections and it seems the problem is more frequent than desired.

Well, that's what XEP-0198 is all about. I thought we were only talking about the pathological case where the queue that's used for fixing this issue grows too large.

In case of malicious intent I think it's even better if we terminate early, though I haven't tried to come up with any real scenario.

My scenario is a client that refuses to acknowledge stanzas with the intent of having the server run out of memory. This is the issue we're trying to solve here, no?

For the resumption period the c2s is still able to receive stanzas

Again: Are we still talking about the case where the server considers the stanza queue to be full?

Maybe I just misunderstood the topic in the first place.

erszcz · 2014-05-15T08:23:10Z

I thought we were only talking about the pathological case where the queue that's used for fixing this issue grows too large.

I think we can't assume this won't happen. I'd rather say that the limit on the buffer size is one of the ways of discovering that something is wrong with the connection. I.e. the client doesn't ack? Probably his/her phone lost connectivity.

client that refuses to acknowledge stanzas with the intent of having the server run out of memory.

Then I uphold what I said: I think it's better to terminate early.

Are we still talking about the case where the server considers the stanza queue to be full?

Sorry, this was a mistake on my side. When we detect that the outgoing stanza buffer overflows, we terminate (i.e. send an error, close the connection, flush queue and bounce messages) without allowing resumption.

weiss · 2014-05-15T09:00:28Z

I thought we were only talking about the pathological case where the queue that's used for fixing this issue grows too large.

I think we can't assume this won't happen.

Of course not.

I'd rather say that the limit on the buffer size is one of the ways of discovering that something is wrong with the connection. I.e. the client doesn't ack? Probably his/her phone lost connectivity.

Well, usually you'll want to make the queue large enough so that this alone cannot lead to overflow under normal™ conditions. So if the buffer fills up, this will usually be an indication of a more severe issue.

In case of malicious intent I think it's even better if we terminate early, though I haven't tried to come up with any real scenario.

Then I uphold what I said: I think it's better to terminate early.

In this case, yes. The issue I mentioned is that this makes another bad scenario possible (i.e., it might allow clients to kick other clients by flooding them with stanzas). All I'm saying is that there's a trade-off, not a strictly better solution.

Sorry, this was a mistake on my side. When we detect that the outgoing stanza buffer overflows, we terminate (i.e. send an error, close the connection, flush queue and bounce messages) without allowing resumption.

Ah, okay. So yes, that's the solution I'd propose as the least bad option, too (if "bouncing" means that error messages are generated and no attempt of re-delivering the stanza flood is done).

Thanks for those insights! I'm looking forward to reading your code :-)

michalwski closed this as completed May 14, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XEP-0198: Stream Management #144

XEP-0198: Stream Management #144

arcusfelis commented Feb 5, 2014

IgorKarymov commented Apr 24, 2014

erszcz commented May 14, 2014

erszcz commented May 14, 2014

weiss commented May 14, 2014

erszcz commented May 14, 2014

weiss commented May 14, 2014

erszcz commented May 15, 2014

weiss commented May 15, 2014

XEP-0198: Stream Management #144

XEP-0198: Stream Management #144

Comments

arcusfelis commented Feb 5, 2014

IgorKarymov commented Apr 24, 2014

erszcz commented May 14, 2014

erszcz commented May 14, 2014

weiss commented May 14, 2014

erszcz commented May 14, 2014

weiss commented May 14, 2014

erszcz commented May 15, 2014

weiss commented May 15, 2014