Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

active_record_connection_pool_waiting always returns 0 #242

Open
bogn83 opened this issue Aug 2, 2022 · 2 comments
Open

active_record_connection_pool_waiting always returns 0 #242

bogn83 opened this issue Aug 2, 2022 · 2 comments

Comments

@bogn83
Copy link

bogn83 commented Aug 2, 2022

The query for it always returns 0 even though we can see lots of

ActiveRecord::ConnectionTimeoutError
could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use

even fully unfiltered by labels and in a 30d time window:

ruby_active_record_connection_pool_waiting > 0

while

ruby_active_record_connection_pool_idle > 0

works.

@bogn83 bogn83 changed the title active_record_connection_pool_waiting wrongly returns 0 active_record_connection_pool_waiting always returns 0 Aug 2, 2022
@slipperywizard
Copy link

+1

@modulitos
Copy link

I believe this metric is working correctly.

I had the same issue as the OP, and suspect it wasn't working as well. But digging deeper, I realized that this metric isn't likely to get scraped by Prometheus unless the system is under sustained load. Since Prometheus only scrapes metrics once in a while (default is 10 seconds), it's possible to get a short spike in load, raise an ActiveRecord::ConnectionTimeoutError, and have ruby_active_record_connection_pool_waiting return to 0 when scraped. So it's a race condition. For this metric to reliably report a value above 0, the system must be under load long enough for the threads waiting in the queue to get scraped. After running a load test, I can confirm that it incremented correctly.

We can see how this works in the code. This metric is set by calling the .stat method on the connection pool:

and here's the .stat method:
https://github.com/rails/rails/blob/eed417637b594b33b3b58b08dd41daa10c57e083/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb#L703-L719
(Referenced AR version is 6.0.5.1)

we can see that the waiting metric is set to num_waiting_in_queue. Tracing through that code, we can see that it's a gauge for the number of threads waiting on a connection, which gets incremented/decremented in this loop:
https://github.com/rails/rails/blob/eed417637b594b33b3b58b08dd41daa10c57e083/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb#L203-L226

The above code snippet should convince us that the number of waiting threads is incremented and reported to the waiting metric, and that it's incremented before the ConnectionTimeoutError error is raised. This all happens within fractions of a second, so unless there's a sustained load, it might not get scraped by Prometheus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants