Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomize requeue time #312

Closed
ctrox opened this issue Jan 28, 2022 · 6 comments · Fixed by #523
Closed

Randomize requeue time #312

ctrox opened this issue Jan 28, 2022 · 6 comments · Fixed by #523
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Milestone

Comments

@ctrox
Copy link

ctrox commented Jan 28, 2022

What problem are you facing?

With the current PollInterval system (at least with the defaults) every controller within a provider will get the same interval. This can be problematic as the number of controllers in a provider increases. At least after a controller restart, all resources will essentially be enqueued at the same time.

This is an example of what I'm talking about (interval is set to 5 minutes for most controllers):

rate(container_cpu_usage_seconds_total{container="provider-nine"}[1m])

Screenshot from 2022-01-28 11-19-34

This is our own internal provider but I'm pretty sure a similar picture could be replicated with other crossplane providers, if all controllers get the same PollInteral.

How could Crossplane help solve your problem?

We could solve this problem outside of crossplane-runtime by randomizing the PollInterval by a bit for every controller in our provider.

But I would prefer if this would already be handled by crossplane-runtime internally as others might also run into this. One idea would be to add a (configurable) random factor to RequeueAfter: r.pollInterval. That random factor could also be zero by default in order to not break the current expectations.

@ctrox ctrox added the enhancement New feature or request label Jan 28, 2022
@negz
Copy link
Member

negz commented Mar 2, 2022

Definitely makes sense to at least jitter the poll interval a bit. I wonder whether we should jitter it once for each controller at startup (such that they all poll at a predictable interval, but offset from each other), or just jitter each poll interval slightly.

@negz negz added help wanted Extra attention is needed good first issue Good for newcomers labels Jul 15, 2022
@eugercek
Copy link

May I pick this up?

@jbw976
Copy link
Member

jbw976 commented Oct 30, 2022

Sure @eugercek if you want to take a stab at this one, thank you for your willingness!

@jbw976 jbw976 moved this to Backlog in Crossplane Roadmap Oct 30, 2022
@jbw976 jbw976 moved this from Backlog to In Progress in Crossplane Roadmap Oct 30, 2022
@eugercek
Copy link

Sorry for late response 😞

What is the current approach we want to implement? Can you guys elaborate on that.

@eugercek
Copy link

I'm stepping down due to lack of information. For other brave ones I think you need to elaborate on the preferred solution and some pointer on codebase. 👋🏻

@eugercek eugercek removed their assignment Dec 19, 2022
@jbw976 jbw976 moved this from In Progress to Backlog in Crossplane Roadmap Jan 12, 2023
@aaronschweig
Copy link

Hello everybody,

just saw this issue labled as good first issue and after reading up on the issue and the proposed solution

Definitely makes sense to at least jitter the poll interval a bit. I wonder whether we should jitter it once for each controller at startup (such that they all poll at a predictable interval, but offset from each other), or just jitter each poll interval slightly.

I wonder if it already could be sufficient to introduce jitter here

pollInterval: defaultpollInterval,
and here
func WithPollInterval(after time.Duration) ReconcilerOption {
return func(r *Reconciler) {
r.pollInterval = after
}
}

Then it would be predictable for each individual controller.

The alternative could be to kind of mirror the behaviour of the SyncPeriod inside the controller-runtime and introduce a small 10% jitter for the RequeueAfter value for each individual resource.

I think both options could be done with the first one being slightly more convenient to implement. What would be a good value for the jitter?

And if you think my proposal is fine, then I am also happy to quickly open a PR for that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
Status: Done
5 participants