Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis Connection does not reconnect when failed, causing Queue Workers to loop #41302

Closed
xrobau opened this issue Mar 3, 2022 · 14 comments
Closed

Comments

@xrobau
Copy link
Contributor

xrobau commented Mar 3, 2022

  • Laravel Version: v8.83.2 (but has been an issue for a long time)
  • PHP Version: 8.1
  • Database Driver & Version: Redis

Description:

Originally reported in #30081 and #29969

If the Redis connection fails, workers do not realise this, and error with:

Trying to access array offset on value of type null {"exception":"[object] (ErrorException(code: 0): Trying to access array offset on value of type null at /usr/local/app/vendor/laravel/framework/src/Illuminate/Queue/Jobs/Job.php:315)

and

 Undefined offset: 1 {"exception":"[object] (ErrorException(code: 0): Undefined offset: 1 at /usr/local/app/vendor/laravel/framework/src/Illuminate/Queue/RedisQueue.php:269)

The code flow basically ends up calling https://github.com/laravel/framework/blob/9.x/src/Illuminate/Redis/RedisManager.php#L82 which doesn't check if the connection is still valid. As the connection NAME is still valid, but the connection itself is closed, there is no reconnect attempt (as there is with other drivers).

This is triggered when there is a socket error:

socket error on read socket {"exception":"[object] (RedisException(code: 0): socket error on read socket at /usr/local/app/vendor/laravel/framework/src/Illuminate/Redis/Connections/Connection.php:116)

which probably THERE should unset itself, or at least mark itself as failed.

Steps To Reproduce:

  • Create a a simple job (eg, sleep 10)
  • Dispatch a pile of those jobs using Redis queue
  • Shut down redis, and wait for the socket error to occur
  • Restart redis

At this point, all jobs will immediately fail with the null/undefined offset, and the only way to fix them is to kill and restart the worker.

@driesvints
Copy link
Member

Right, so, what I'm interested in is what the value for $this->payload() is at Illuminate/Queue/Jobs/Job.php:315 and what the value is at $nextJob at Illuminate/Queue/RedisQueue.php:269.

I suspect the failed jobs still stay in redis, waiting to be processed and that they get processed once you restart the worker?

@xrobau
Copy link
Contributor Author

xrobau commented Mar 4, 2022

https://github.com/laravel/framework/blob/9.x/src/Illuminate/Queue/Jobs/Job.php#L315

        return $this->payload()['job'];

The error is saying Trying to access array offset on value of type null so, $this->payload() is null at this point.

$nextJob is an array that only consists of ONE ITEM, not two (hence the Undefined offset: 1), and as far as I remember, it's something along the lines of ["Error: Unable to connect to Redis"] (not exact, this is from memory).

The problem to me appears to be that there's no way that I could find to catch the \Redis exception in Illuminate/Redis/Connections/Connection.php:116 and mark the connection as failed. It's not possible to reconnect there, as it needs to be destroyed and recreated in RedisManager::resolve()

https://github.com/laravel/framework/blob/9.x/src/Illuminate/Redis/RedisManager.php#L103

@tm1000
Copy link
Contributor

tm1000 commented Mar 5, 2022

I feel like you could catch this error in: https://github.com/laravel/framework/blob/9.x/src/Illuminate/Queue/Worker.php#L392

The reason being is that when the worker loses a database connection shouldQuit is set to true so the worker quits and then reconnects when it starts again. This same concept could apply to Redis

@xrobau
Copy link
Contributor Author

xrobau commented Mar 5, 2022

Good spotting @tm1000 (FYI, he is a co-worker of mine)

@driesvints I didn't want to submit a PR until someone decides on the 'correct' way to handle this, and as it's getting into the fiddly in-depth bits of queue workers, I thought it would be better to report it as a bug first and then get some guidance on how you guys would like it handled.

@driesvints
Copy link
Member

What version is your Redis?

$nextJob is an array that only consists of ONE ITEM, not two (hence the Undefined offset: 1), and as far as I remember, it's something along the lines of ["Error: Unable to connect to Redis"] (not exact, this is from memory).

This seems highly unlikely as the current code totally not expects that format.

I have no idea how to solve this unfortunately. Ideally, someone starts a PR which we can start off from.

@xrobau
Copy link
Contributor Author

xrobau commented Mar 8, 2022

@driesvints well, you're the guy who makes the call!

What would you prefer?

  • Illuminate/Redis/Connections/Connection.php - catch the error there, and add a new property to the redis connection of 'Is Broken' (or whatever), which is then checked by RedisManager::resolve() -- This is, obviously the easiest way
  • Try to reimplement a similar process as the MySQL workers do - haven't really thought about that or figured out the code flow.

I'm happy to do the PR, but I'd want to do it in a way you're happy with.

The OTHER problem is that to write a unit test for this requires at least three moving parts - a working redis, stopping that working redis, and then restarting it. Would you need a unit test for this, or would you be happy without it?

@xrobau
Copy link
Contributor Author

xrobau commented Mar 8, 2022

What version is your Redis?

Sorry if I didn't make this clear - that response comes from the \Redis object, when it is in its broken state. Not from redis-the-actual-process.

If it's actually important I can spend some time to build a dockerfile that demonstrates this, but it's super simple for to duplicate it manually, so I haven't worried about trying to figure out a way to prove this automatically (hence my 'Is this OK without a unit test?' question above 8))

@driesvints
Copy link
Member

well, you're the guy who makes the call!

Not really. I don't merge PR's 😅

Let's try the first option and see how it goes.

@driesvints
Copy link
Member

Feel free to send in that PR when you can, thanks.

@xrobau
Copy link
Contributor Author

xrobau commented Mar 15, 2022

Why is this ticket closed? The bug still exists, and I (or no-one else) has created a PR, so it still needs to be open.

@driesvints driesvints reopened this Mar 15, 2022
@driesvints
Copy link
Member

@xrobau just send in that PR if you can please.

@xrobau
Copy link
Contributor Author

xrobau commented Mar 15, 2022

Thank you, I'll try to get onto it today.

xrobau added a commit to xrobau/framework that referenced this issue Mar 16, 2022
…nections

When a \Redis connection has a socket/connection error, there's no
reliable way to re-establish the connection. This adds a simple
public 'shouldRestart' bool (based on the existing Queue\Worker
'stopWorkerIfLostConnection' code) to the Redis Connection, which
is checked in RedisManager and recreated if needed there.

Signed-Off-By: Rob Thomas <[email protected]>
@xrobau
Copy link
Contributor Author

xrobau commented Mar 16, 2022

#41502

My default PHP Linters decided that new xxxx should be new xxxx() - I am unsure if that is actually in any PSR, but I left it as it was. If this is an issue with your coding standards, feel free to remove them.

The code ended up being pretty trivial, as soon as I looked into it.

xrobau added a commit to xrobau/framework that referenced this issue Mar 17, 2022
This is an alternative option to PR laravel#41502, and adds a match
for 'socket' ("socket error when (reading|writing)") to trigger
a reconnection.

I am not sure if this is valid in other languages, too.

My personal preference would be to remove the entire if and
ALWAYS reconnect on any Exception

Signed-Off-By: Rob Thomas [email protected]
taylorotwell pushed a commit that referenced this issue Mar 18, 2022
* Bugfix #41302 - Alternative

This is an alternative option to PR #41502, and adds a match
for 'socket' ("socket error when (reading|writing)") to trigger
a reconnection.

I am not sure if this is valid in other languages, too.

My personal preference would be to remove the entire if and
ALWAYS reconnect on any Exception

Signed-Off-By: Rob Thomas [email protected]

* Fix typo (used vim instead of my usual IDE)
@yientau
Copy link

yientau commented Mar 5, 2024

I saw this being fixed. I am currently running on Laravel 7.30.6...which version of Laravel should I be upgrading to? Looks like I am doomed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants