Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-4306: Avoid CloseSessionTxns larger than jute.maxbuffer #1716

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

ztzg
Copy link
Contributor

@ztzg ztzg commented Jun 17, 2021

Allowing a serialized CloseSessionTxn to grow larger than jute.maxbuffer is a really bad idea: not only does it quickly makes the target ensemble unavailable, it even prevents it from restarting.

Such availability issues have been reported or fixed in the past, notably in ZOOKEEPER-2101 and ZOOKEEPER-3496. This one is a bit different because of the "distance" between the cause and its effect.

The size of a CloseSessionTxn transaction is directly related to the total length of the paths of the ephemeral nodes belonging to the session to be closed. That "mass" of data can be built incrementally, across many requests; it is not that difficult for a small "leak" to push the transaction size over jute.maxbuffer if a session lasts long enough.

This series adds serialization size bookkeeping to the per-session ephemeral node management, and fails ephemeral node creation requests which would potentially result in an "overflowing" CloseSessionTxn.

@ztzg
Copy link
Contributor Author

ztzg commented Jun 17, 2021

Cc: @hanm, @symat, @anmolnar.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is the right way to address the problem.
The problem is in close session, here you are adding an hard (unpredictable and u configurable) bound on the number of ephemeral nodes.

If the problem is only about close session we could work only on that case.

If you want to go down this way of limiting the number of ephemeral we must make it configurable somehow.

You are also introducing a new error code, how do old clients will handle that error code?

@ztzg
Copy link
Contributor Author

ztzg commented Jun 23, 2021

Hi @eolivelli,

Thank you for the initial review!

Perhaps I should create a parallel thread on the dev mailing list, so that we can gather a maximum of ideas; what do you think? In the meantime, here are a few precisions and questions:

I am not sure this is the right way to address the problem.

Right. Me neither, but I thought I'd start the discussion with a concrete proposal, so here we are :)

The problem is in close session, here you are adding an hard (unpredictable and u configurable) bound on the number of ephemeral nodes.

Well, yes and no.

The "unpreditable" bound exists—it's just that right now it does not cause failures at creation time, but rather a hard crash of the whole ensemble at session close. Hitting the new error is bad news, but not worse news than the status quo :)

It is possible to "manually" delete the nodes, of course, in which case there is no issue at session close—but having that many ephemerals in a session probably means that the application has gone berserk.

If the problem is only about close session we could work only on that case.

Yes, but. As far as I can tell, other options include:

  1. Splitting closeSession into multiple transactions (or records) when necessary. Would this seem like a promising approach to you? I don't think we have a precedent in ZooKeeper, do we? This is something I considered, but I thought I'd start with this patch as this alternative might open a can of worms;

  2. Ignoring jute.maxbuffer when reading (certain kinds of) records. One could argue that a record should be "good enough to read" if a peer (as opposed to a client) decided it was "good enough to write."

One problem with these options, though, is that the amount of data still is under client control, and mostly not covered by the quota mechanism—they are, in a way, a loophole allowing clients to work around the (admittedly quite limited) DoS protection afforded by Jute.

In any case: would you have additional suggestions to append to that list?

If you want to go down this way of limiting the number of ephemeral we must make it configurable somehow.

First, we could of course make the "feature" optional—but again: the ensemble is in great danger if the limit is reached, which is why I did not.

Note that the limit is configurable, though, even if not independently—it simply tracks jute.maxbuffer. Also:

  1. A configurable limit only makes sense if set lower than jute.maxbuffer, as exceeding the latter would still expose the ensemble;

  2. Allowing the admin to specify lower values would in effect be some kind of global "session size quota" setting. Which might or might not be useful.

  3. I totally agree that specifying the limit as a size (as opposed to a count) is quite unintuitive—but this is the resource which needs protection.

You are also introducing a new error code, how do old clients will handle that error code?

Badly, particularly when it comes to synchronous operations. (This is a common problem with the current ZooKeeper protocol; most clients are not very good at dealing with unexpected codes.) I actually have some patches about that which I should polish and submit here.

But hey, at least they won't crash the ensemble :)

Cheers, -D

@eolivelli
Copy link
Contributor

I think that we can resume this patch. The problem has been reported again on the mailing list (and I don't have a better proposal at the moment)

@eolivelli eolivelli self-requested a review August 19, 2024 12:06
@anmolnar
Copy link
Contributor

@ztzg When you have a chance, would you please rebase this patch and resolve conflicts?

@eolivelli @kezhuw Shall we shoot for 3.9.3 here what do you think?

@ztzg
Copy link
Contributor Author

ztzg commented Oct 17, 2024

Hi @anmolnar,

@ztzg When you have a chance, would you please rebase this patch and resolve conflicts?

I just pushed a rebased version.

@eolivelli @kezhuw Shall we shoot for 3.9.3 here what do you think?

I was late for your previous attempts at a 3.9.3, but perhaps we get another go at it? I created a branch-3.9 PR at #2201.

@ztzg
Copy link
Contributor Author

ztzg commented Oct 17, 2024

Hi @eolivelli,

I think that we can resume this patch. The problem has been reported again on the mailing list (and I don't have a better proposal at the moment)

I don't have a better proposal either. Here is a rebased version, with a slightly expanded test.

@kezhuw
Copy link
Member

kezhuw commented Oct 18, 2024

What if we have consistent point-in-time snapshot ? ZOOKEEPER-4874 proposed not to do fuzzy snapshot.

We probably do too many tricks for fuzzy snapshot to solve corner cases, and I think it is worth for us to resort to a consistent approach.

Fuzzy snapshot is fuzzy.

@ztzg
Copy link
Contributor Author

ztzg commented Oct 24, 2024

(@kezhuw: wrote this a few days ago, but it seems GitHub drops email replies now?!)

Hi Kezhu,

Are you sure this is the issue you want to comment on? "Overflowing" CloseSessionTxn's create problems in transaction logs, and I don't see how this would interact with the fuzziness of snapshots. Am I missing something?

Cheers, -D

P.-S. — I will have a look at ZOOKEEPER-4874, but snapshots are not that fuzzy given the idempotency of operations. We might want to relitigate, but I would say that the fuzziness was a carefully considered design point in the architecture of ZooKeeper.

(Note that in my "own" ZOOKEEPER-4846, it's the "ACL storage optimization" which bit us—some new checks were too eager in looking up ACLs before the data structure was fully "reconciled.")

@kezhuw
Copy link
Member

kezhuw commented Oct 28, 2024

I thought ZOOKEEPER-4306 is caused by ZOOKEEPER-3145(#622) which is dealing with fuzzy snapshot.

@ztzg
Copy link
Contributor Author

ztzg commented Oct 31, 2024

I thought ZOOKEEPER-4306 is caused by ZOOKEEPER-3145(#622) which is dealing with fuzzy snapshot.

Interesting; thanks for the link! I will check it out.

But no, ZOOKEEPER-4306 would happen even without fuzziness. The fundamental issue is that transactions are inherently bounded by jute.maxbuffer, whereas CloseSessionTxns can hold an unbounded number of ephemerals.

@kezhuw
Copy link
Member

kezhuw commented Nov 7, 2024

The fundamental issue is that transactions are inherently bounded by jute.maxbuffer, whereas CloseSessionTxns can hold an unbounded number of ephemerals.

Before ZOOKEEPER-3145(#622), closeSession is a payload-less txn, that is there is no CloseSessionTxn. Given a consistent snapshot and idempotent txn processing, there is no need for CloseSessionTxn.

https://github.com/apache/zookeeper/pull/622/files#diff-7772ab9c1c821c4af645018c6b85b5ddfebbf8e40e2fcd766fef4696f3407591R326-R328

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants