Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curator framework improvements #2225

Merged
merged 6 commits into from
Aug 25, 2021
Merged

Conversation

ajammala
Copy link
Contributor

Currently we use a single instance of CuratorFramework for the entire service. This PR adds a way to create multiple instances of CuratorFramework and wrap them behind a load distributor (using round robin scheduling).

@ssalinas
Copy link
Member

🚢

@ajammala ajammala merged commit 78de2ba into master Aug 25, 2021
@jhaber
Copy link
Member

jhaber commented Aug 25, 2021

Is there any concern about this causing issues for a sequence of events like:

  1. write data to path /XYZ
  2. issue a getData call to path /XYZ

With a single curator, I believe that you're guaranteed that the getData call returns the data you just wrote, because the read and write are going to the same ZooKeeper server. But with this change, the read and write could go to different ZooKeeper servers, and the read might not return the latest data. Normally you could work around this by adding a call to sync, but that sync call might not get routed to the same CuratorFramework as the subsequent getData call, in which case it would have no effect.

@ssalinas
Copy link
Member

@jhaber this is only enabled for read-only cases on instances that do not contend for leader latch (we split it into to separate deploys to isolate the scheduler from heavy read traffic). i.e. they never write. Only the leading scheduler is doing writes and all other instances proxy to the leader (for other various reasons like that it keeps most state in memory and just persists to zk).

If we were to enable this on instances that write we'd have to do a bit more work around giving out a specific curator instance and using it for the duration of the method calls. We didn't feel we needed that level of optimization yet as the main area of pain we were trying to solve for was the read only api instances

@ssalinas ssalinas deleted the curator_framework_improvements branch August 25, 2021 19:51
@jhaber
Copy link
Member

jhaber commented Aug 25, 2021

Ah ok makes sense, thanks for clarifying. Do you have a sense of how much of the benefit comes from reducing contention on the client side vs. server side? If most of the benefit is client side, we could theoretically force all of the CuratorFramework instances to connect to the same ZooKeeper server, which I think would avoid most of the weirdness and be safer to use on all the instances (maybe with some additional tweaks).

Also, a more robust option might be to defer the CuratorFramework selection until we know the path, and use a hash of the path to consistently pick the same CuratorFramework. But it would probably be a nightmare to wire that up because the path is usually specified last in the method call chain.

@ssalinas
Copy link
Member

yeah, the way it does builders makes that rough. In terms of benefit though, on Singualrity scale going from 1 -> 3 curator instances, we saw the metrics for total curator call time go from maxing out around 5s+ to a few hundre millis

@ssalinas ssalinas added this to the 1.5.0 milestone May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants