Replies: 3 comments 1 reply
-
Most common cause of this would be some use of forking, where the database connections are used in forked processes. Can you check whether there is any forking in the application (or at least, any forking while a Sequel::Database object has connections). |
Beta Was this translation helpful? Give feedback.
-
I was trying to see how they use message_bus themselves at discourse, and they have this config where they tell ActiveRecord to close all connections on disconnect, seems like it may be relevant, does Sequel have a function like that? |
Beta Was this translation helpful? Give feedback.
-
🤦🏼 Wow, I just found this: https://sequel.jeremyevans.net/rdoc/files/doc/fork_safety_rdoc.html .. and adding that before_fork seems to have solved it. I can't believe I didn't know this before, and even more surprised we haven't seen issues before this as we've been using puma in forking mode for ages. |
Beta Was this translation helpful? Give feedback.
-
Hey Jeremy, hope all is well.
I'm at my wits end trying to diagnose a bizarre threading issue when using the above tools in combination. We're trying to add message_bus to our system and all was well at first in development, where we use puma in single mode, but as soon as we deployed to our real environments we started seeing random 500's in the app that includes the message_bus middleware.
To start off, I don't think this is at all a problem with sequel, but since it affects the DB connection pool, I figured I'd start here. Here's what we we're seeing:
When we have our rack app running in puma clustered mode (with 2 or more workers, 5 threads each), and we have the message_bus middleware mounted in the app, we see errors like this on random endpoints, including the message_bus ones themselves:
I've even intermittently seen hard seg faults:
message_bus middleware does some threading behind the scenes that I don't really understand and makes use of rack hijack. So I'm guessing there's some scenario where it's hijacking threads, closing connections and then they are trying to be reused elsewhere.
To validate that, I enabled the connection_validation extension for each call, and it does indeed "fix" the problem as unusable connections are not used anymore, but of course, I'd rather not add that because of the performance hit.
Here's our config that I think is relevant:
Any leads, or debugging tips you could provide would be greatly appreciated! Thanks
Beta Was this translation helpful? Give feedback.
All reactions