Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.x] Adding scaling to balance = false #1473

Merged
merged 4 commits into from
Jul 22, 2024

Conversation

dbpolito
Copy link
Contributor

This introduces a new strategy for balance: single

What this does is, it still has scaling similar to auto, but instead of having one worker per queue, it still keeps all queues into the same process, following the order specified, similar to what you would have using queue:work --queue=a,b,c

TBH this was the behavior i expected when i started using horizon a long time ago...

I'm using it in my project and thought it would be useful for others...

I can open a PR on the Docs in case this gets accepted / merged.

Maybe the name isn't that clear, totally open to change.

@fabriciojs
Copy link

👍🏻 this is nice, as this way we don't get bloated with too many workers, but we can still make our queues more granular for straight visibility into each of them.

@taylorotwell
Copy link
Member

What is the main difference between this and 'balance' => false?

@dbpolito
Copy link
Contributor Author

dbpolito commented Jul 12, 2024

Well, off won't auto scale, right? It will basically stick with the process or max process... This will scale based on the jobs, but each worker will have all queues, so it will scale up to x and each worker with queue=a,b,c

@taylorotwell
Copy link
Member

taylorotwell commented Jul 12, 2024

Gotcha. I do wonder if we really need a new strategy for this if off behaves similar to this but just using maxProcesses. You have to build your server to be able to handle your queue worker load at maxProcesses anyways so why not just let it run at max all the time using the off strategy?

@dbpolito
Copy link
Contributor Author

Well while this is true, there are several benefits of not spending the memory just to run the workers unnecessarily:

  1. Reserving all this memory just to keep seeking for jobs (~100mb per worker)
  2. Having more free memory for the jobs to process
  3. Scaling Horizon Instances on K8S based on resources usage

Also this brings the functionality of queue priority, so it needs to process on the order.

I think the complexity added vs functionality might worth to more people.

I'm already using this strategy on my project and only changing to it reduced 30% of memory usage in my case... Of course it will change depending on how your queues are designed.


Another thing that would help on this matter which was discussed on other issues is starting workers on min instead of max, to avoid unnecessary memory peak / waste.

@taylorotwell
Copy link
Member

Should this just be the default behavior of balance => false? I don't think the name single really conveys much.

@dbpolito
Copy link
Contributor Author

Well, that is what i expected to be TBH... i agree, but isn't it a "BREAKING CHANGE"? Won't break anything but will change the behavior for everyone who uses it.

I'm totally ok with that, or come up with a different name...

@taylorotwell
Copy link
Member

I'm not sure if it's a breaking change. We never actually document the scaling behavior of balance => false and it shouldn't "break" your application as it's mainly an internal optimization for Horizon.

@dbpolito
Copy link
Contributor Author

well makes sense... i will try to change the PR later today or tomorrow tops.

@taylorotwell
Copy link
Member

Thanks! Just mark as ready for review when you want me to take another look.

@taylorotwell taylorotwell marked this pull request as draft July 19, 2024 16:15
@dbpolito dbpolito marked this pull request as ready for review July 19, 2024 19:58
@dbpolito
Copy link
Contributor Author

@taylorotwell updated 👍

@taylorotwell taylorotwell merged commit ee804ec into laravel:5.x Jul 22, 2024
10 checks passed
@taylorotwell
Copy link
Member

Thanks

@dbpolito dbpolito deleted the balance_single branch July 22, 2024 14:55
@ChristopheBorcard
Copy link

It might have been wise to rename the PR.

Because now the change log on the Tag is talking about a new Single balance strategy that doesn't exist:

https://github.com/laravel/horizon/releases/tag/v5.26.0

@dbpolito dbpolito changed the title [5.x] Adding new balance strategy: single [5.x] Adding scaling to balance = false Jul 24, 2024
@dbpolito
Copy link
Contributor Author

I renamed the PR but it won't update the release i think

@driesvints
Copy link
Member

I updated all references. Thanks

@vilhelmjosander
Copy link

A bit late to the party here but this seems to have broken my use of Horizon with balance = false.

I have a fairly large application which sometimes handles upwards of 200k jobs per hour on a single VPS. After updating Horizon to >= 5.26.0 it suddenly stopped processing jobs (or processed them EXTREMELY slow). Downgrading to 5.25.0 and everything is back to normal again.

This is my setup:

 'environments' => [
        'production' => [
            'supervisor-live' => [
                'connection' => 'redis-live-battles',
                'queue' => ['live'],
                'balance' => 'false',
                'minProcesses' => 1,
                'maxProcesses' => 5,
                'balanceMaxShift' => 15,
                'balanceCooldown' => 1,
                'tries' => 3,
                'timeout' => 80,
            ],
            'supervisor-default' => [
                'connection' => 'redis',
                'queue' => ['urgent', 'high', 'default', 'low'],
                'balance' => 'false',
                'minProcesses' => 1,
                'maxProcesses' => 50,
                'balanceMaxShift' => 15,
                'balanceCooldown' => 1,
                'tries' => 3,
                'timeout' => 80,
            ],
        ],

Unfortunately I am not familiar enough with the internals of Horizon to be able to debug this myself but i've narrowed it down to this PR.

This is what my horizon dashboard looks like after deploying >= 5.26.0. Basically jobs just keep piling up without being processed across the different queues.

CleanShot 2024-08-23 at 22 41 30@2x

@dbpolito
Copy link
Contributor Author

@vilhelmjosander

Well, i also got a fairly big project and haven't noticed any difference on performance, processing ~15k jobs as usual...

Did you check the job throughput? Is that different on versions?

TBH i can't see a reason to impact performance, the only behavior difference is that it scales now... before it would always be at 50 process in your case... now it will scale from 1 to 50... but the way it process and how it process is exactly the same.

Looking at your screenshot, it seems it scaled properly and it's using 50 processes...

I would look at the job throughput / runtime... as maybe the jobs may be different.

@trevorgehman
Copy link
Contributor

Just to add another comment. This affected our Horizon workers because we had an old config file where only processes was specified, instead of minProcesses and maxProcesses. Here is our config:

'supervisor-1' => [
    'connection' => 'redis',
    'queue' => ['critical'],
    'balance' => false,
    'processes' => 6,
    'tries' => 10,
    'sleep' => 2,
    'timeout' => 3500,
    'memory' => 1500,
    'maxJobs' => 1000,
],

Horizon treats processes as maxProcesses. And since we didn't have a value minProcesses it defaulted to 1.

$key = $key === 'processes' ? 'max_processes' : $key;

So basically it enabled scaling for all these workers which previously didn't scale.

It didn't break anything for us, just took us a couple weeks to notice that our queues processing jobs a bit slower.

@sts-ryan-holton
Copy link

Just jumping in here, I tried out setting 'balance' => false for my supervisors in my Horizon config, not completely sure what the benefit here would be? Does this technically mean it processes each queue synchronously, so each queue in order first? What's the benefits?

@vilhelmjosander
Copy link

I'm at a loss at what to do here. As soon as I upgrade to >= 5.26.0 horizon basically stops processing jobs with balance = false. It works flawlessly with 5.25.0.

Currently this also stops me from updating laravel/framework since horizon 5.25.0 gives the following errors with Laravel >= 11.27.0:

Declaration of Laravel\Horizon\RedisQueue::pop must be compatible with Illuminate\Queue\RedisQueue::pop.

Any ideas? :)

@sts-ryan-holton
Copy link

sts-ryan-holton commented Oct 19, 2024

@vilhelmjosander A few things to consider:

  1. Not sure if it's causing you a problem, but it should be 'balance' => false instead of 'balance' => 'false'. From your screenshot, your "Balancing" column says "False", but when I use 'balance' => false (note the boolean not string) then under "Balancing" it'll say "Disabled", so check this
  2. After looking at your screenshot, your queues, urgent, high, default and low in that order, I'm pretty sure it'll try to process everything on the urgent queue first, then move on to the high queue, then default, then low. This actually means that, if for example you have a tonne of jobs coming in on the urgent queue, or high queue, then any queue later on simply will just build up and never really process anything unless you allocate more workers/processes.

I noticed this in my own platform, to try to illustrate it better, if you have 5 workers and each job takes 1 second and you have new jobs filling the urgent queue as quick as they're being processed, no queues below it will have enough workers.

Does this help?

@dbpolito
Copy link
Contributor Author

dbpolito commented Oct 19, 2024

@vilhelmjosander it's hard to help without a way to reproduce...

I tried to reproduce here... played with the tests, and everything seems to be working as it should...

Can you check the processes that are running? with ps aux or something like that, to check how all those 50 processes looks? You should have 50 processes with ... --queue="urgent,high,default,low" ...

And again, are you sure it's not processing? Because as stated above, it will process on this queue order... so all urgent first, then high, then default, then low...

None of the queues are processed? or only the urgent ones?

Of course we need to fix it, if something is wrong... but have you tried changing the balance? Maybe for your use case you do need to change to simple or auto

@vilhelmjosander
Copy link

vilhelmjosander commented Oct 19, 2024

@dbpolito

At once after deploying >= 5.26.0 it starts to process the queues extremely slow. I now tried to change the setting to false (boolean) instead of the string 'false' (which works fine on 5.25.0).
CleanShot 2024-10-19 at 20 34 39@2x

This is the only thing I see which is related to horizon with ps aux. Same for ps aux | grep queue or similar. I do not have 50 processes (which I do have with 5.25.0).

CleanShot 2024-10-19 at 20 42 16@2x

I did note that after deploying >= 5.26.0 ps aux did show 20-50 processes with ... --queue="urgent,high,default,low" ... for a while, then it stopped and no processes at all were visible which basically meant that jobs were not processed.

Here is my current config:

CleanShot 2024-10-19 at 20 43 49@2x

Switching to balance = 'auto' seems to work. The reason why I've had balance = false is that I do want the default laravel behaviour where the jobs are processed in the queue order. It is important that the urgent queue has higher priority than the low queue and always processes asap.

@dbpolito
Copy link
Contributor Author

@vilhelmjosander

the only thing i noticed which shouldn't impact anything, but who knows... is these supervisors uses different redis connections...

so when you run it you only have the supervisor processes running, and no actual worker running? that's weird... as horizon doesn't support scaling to 0 workers...

are you sure nothing pops into your logs? errors or anything?

Really hard to help, i tried changing tests, trying to reproduce but not a lot of luck.

Maybe you can setup a reproducible repo so we can jump in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants