Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage in redis #715

Closed
mauri870 opened this issue Dec 3, 2019 · 18 comments
Closed

High memory usage in redis #715

mauri870 opened this issue Dec 3, 2019 · 18 comments

Comments

@mauri870
Copy link

mauri870 commented Dec 3, 2019

  • Horizon Version: v3.4.3
  • Laravel Version: v6.6.0
  • PHP Version: 7.3.11
  • Redis Driver & Version: predis/phpredis 1.1.1 or php-redis extension 5.1.1, same result
  • Database Driver & Version:

Description:

After we upgraded from Laravel 5.8 to 6.6 / Horizon 3.2.2 to 3.4.3 the resource consumption of our redis 4 server started to grow exponentially, sitting at around 5Gb.

Horizon Dashboard:
horizon-dashboard

Redis instance dedicated to horizon:
redis-usage

Previous version vs new (the release was pushed Nov 25):

Screenshot_1575402597

Steps To Reproduce:

We have no clue of what is causing this behavior, but our number of jobs is almost the same as before, the only difference is the framework and horizon versions.

@SDekkers
Copy link
Contributor

SDekkers commented Dec 4, 2019

I have the same issue, the jobs when finished appear to not be removed from the redis memory, resulting in high memory usage. Running about 500k jobs per day

What are your trim values in horizon.php?

@mauri870
Copy link
Author

mauri870 commented Dec 4, 2019

    'trim' => [
        'recent' => 30,
        'recent_failed' => 30,
        'failed' => 60,
        'monitored' => 0
    ],

I can't find a trim option for completed jobs tho

@mauri870 mauri870 changed the title High cpu and memory usage in redis High memory usage in redis Dec 4, 2019
@mauri870
Copy link
Author

mauri870 commented Dec 4, 2019

The problem seems to be that trim.recent is used to either expire a job that has been pushed(but not processed) and jobs that are already completed. Maybe it should be considered to add a new trim.completed to expire the completed jobs without loosing jobs that are still on queue.

@mauri870
Copy link
Author

mauri870 commented Dec 4, 2019

For now our solution was to simply expire/delete the completed job to free up some memory:

Queue::after(function (JobProcessed $event) {
	Redis::expireat(config('horizon.prefix') . $event->job->getJobId(), Carbon::now()->addMinute()->timestamp);
});

@mauri870
Copy link
Author

mauri870 commented Dec 4, 2019

Thats our fix in a horizon fork with trim.completed of 1:

Screenshot_1575487248

diff --git a/config/horizon.php b/config/horizon.php
index b9803a8..318a945 100644
--- a/config/horizon.php
+++ b/config/horizon.php
@@ -98,6 +98,7 @@ return [
         'recent_failed' => 10080,
         'failed' => 10080,
         'monitored' => 10080,
+        'completed' => 60,
     ],
 
     /*
diff --git a/src/Repositories/RedisJobRepository.php b/src/Repositories/RedisJobRepository.php
index 171b040..dfb2cac 100644
--- a/src/Repositories/RedisJobRepository.php
+++ b/src/Repositories/RedisJobRepository.php
@@ -66,6 +66,7 @@ class RedisJobRepository implements JobRepository
     {
         $this->redis = $redis;
         $this->recentJobExpires = config('horizon.trim.recent', 60);
+        $this->completedJobExpires = config('horizon.trim.completed', 60);
         $this->failedJobExpires = config('horizon.trim.failed', 10080);
         $this->recentFailedJobExpires = config('horizon.trim.recent_failed', $this->failedJobExpires);
         $this->monitoredJobExpires = config('horizon.trim.monitored', 10080);
@@ -405,7 +406,7 @@ class RedisJobRepository implements JobRepository
             ? $pipe->hmset($id, ['status' => 'failed'])
             : $pipe->hmset($id, ['status' => 'completed', 'completed_at' => str_replace(',', '.', microtime(true))]);
 
-        $pipe->expireat($id, Chronos::now()->addMinutes($this->recentJobExpires)->getTimestamp());
+        $pipe->expireat($id, Chronos::now()->addMinutes($this->completedJobExpires)->getTimestamp());
     }
 
     /**

@travisaustin
Copy link

travisaustin commented Dec 4, 2019

I have similar issues. I was able to get the job to expire correctly by setting horizon.trim.recent in my config. (Check the __construct() function in RedisJobRepository.php to see where it's accessed).

That said, I still have an issue of memory slowing filling up, and I think it's because keys are left behind in the horizon:recent:TAGNAME Redis keys. Right now, for example, I only have about 100 recent jobs listed, but the horizon:recent:TAGNAME keys contain over 2,000,000 entries, all referencing IDs of long-expired jobs.

@driesvints
Copy link
Member

Please see #625

Are you all monitoring tags?

@travisaustin
Copy link

travisaustin commented Dec 5, 2019 via email

@mauri870
Copy link
Author

mauri870 commented Dec 5, 2019

Neither I. The problem is indeed Horizon not cleaning completed jobs until trim.recent expires. In our case, with 440k jobs every 30 minutes the completed jobs are causing redis to fill up memory quickly and also increasing cpu usage due to the number of keys. Please refer to the diff above which introduces a mechanism to control how long completed jobs are persisted.

#715 (comment)

@travisaustin
Copy link

travisaustin commented Dec 5, 2019

I think there are two issues here.

First, as reported by @mauri870, is that completed jobs are retained for 1 week by default. This is easily solved by using the undocumented configuration option of horizon.trim.recent. @mauri870 - I don't think your diff is necessary if you set the configuration item horizon.trim.recent to a low value. Is that correct?

Second is that all new job IDs are added to the key horizon:recent:TAGNAMEHERE (where TAGNAMEHERE is the name of a tag). Even if these tags are not monitored, these keys fill with the Job ID of every job that is dispatched with that tag. Horizon never cleans out this list of Job IDs, and these keys continue to fill up until they are manually cleared.

Edit: there are two places that fill up. horizon:recent:TAGNAMEHERE and horizon:failed:TAGNAMEHERE

Edit again: I just realized that the configuration option horizon.trim.recent sets the TTL on the Redis job payload when it's created. If the job isn't dispatched before the horizon.trim.recent expires, the job payload will disappear from Redis before it can be dispatched. Am I understanding that right?

@mauri870
Copy link
Author

mauri870 commented Dec 6, 2019

@travisaustin I think you are, at least reading the source code. That's why I added trim.completed in my fork, it's working as expected now.

#715 (comment)

@themsaid
Copy link
Member

themsaid commented Dec 9, 2019

A solution is proposed in #720

@eKevinHoang
Copy link

eKevinHoang commented Jan 13, 2020

I have the same issue, my config is

    'waits' => [
        'redis:default' => 60,
    ],

    /*
    |--------------------------------------------------------------------------
    | Job Trimming Times
    |--------------------------------------------------------------------------
    |
    | Here you can configure for how long (in minutes) you desire Horizon to
    | persist the recent and failed jobs. Typically, recent jobs are kept
    | for one hour while all failed jobs are stored for an entire week.
    |
    */

    'trim' => [
        'recent'        => 60,
        'recent_failed' => 10080,
        'failed'        => 10080,
        'monitored'     => 10080,
    ],

I'm using gdb to dump the memory of a horizon:work process and I see that many queue doesn't release from memory even it has been finished for hour. it seems an issue of JobMetrics feature.

My horizon version: v3.4.3
My Laravel version: v6.6

@TheOneDaveYoung
Copy link

I'm not understanding why #720 was closed? It seems to me the current situation with very high, possibly runaway resource utilization by Redis is a larger issue than possibly wonky pagination? Am I missing something here?

@mauri870 it's been a couple of months since your forked solution. How is it holding up and are you experiencing issues with pagination as discused in #720?

@mauri870
Copy link
Author

@TheOneDaveYoung IDK why Taylor closed that PR, he mentioned that something was not right with pagination but after we changed to my fork the OOM problems ceased and everything seems to be working great. More than 2 months now without problems.

@driesvints
Copy link
Member

#720 was merged. This will unfortunately break pagination. We're currently considering separating the different types of jobs into separate screens to solve this problem.

@xwiz
Copy link

xwiz commented Aug 31, 2020

Still kind of experiencing this issue. I noticed Horizon uses zsets and hash sets. Maybe there are some sane tunings one can use to improve the memory consumption and performance since by default if you're running millions of jobs, using generic data types will have too much memory spill.

@SumitChowjar
Copy link

I am also troubling with high memory usage and my system crashed.

Laravel: v8.35.1. Horizon: v5.7

I have around 40k records. I made a chunk of 300 records and process each chunk via job.

Users::chunk(300, function($users) {
     //dispatch a job
     MigrateUsers::dispatch($users->all());
});

My horizon config is:

'waits' => [
        'redis:default' => 60,
    ],

'trim' => [
        'recent' => 60,
        'pending' => 2880,
        'completed' => 60,
        'recent_failed' => 1440,
        'failed' => 2880,
        'monitored' => 2880,
    ],

'memory_limit' => 128,

'defaults' => [
        'supervisor-1' => [
            'connection' => 'redis',
            'queue' => ['default'],
            'balance' => 'auto',
            'minProcesses' => 1,
            'maxProcesses' => 2,
            'memory' => 128,
            'tries' => 2,
            'nice' => 0,
        ],
    ],

    'environments' => [
        'local' => [
            'supervisor-1' => [
                'maxProcesses' => 2,
                'balanceMaxShift' => 1,
                'balanceCooldown' => 3,
                'timeout' => 900 // Timeout after 15 minutes
            ],
        ],
    ]

And in this process 7 to 8 GB of RAM is consumed and the system reboots in between.
@mauri870

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants