-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DFS: proctitle naming scheme for Puma and others #235
Conversation
08d1222
to
e64a3c4
Compare
e64a3c4
to
f69454b
Compare
Just want to leave a quick comment that I saw your PR, I've just been pretty sick lately and haven't been able to actually look at it in detail, but based on the description alone, i'm super interested in this. |
Oh get better! Take your time. Looking forward your opinion about this. Cheers and stay safe. |
@@ -205,7 +244,7 @@ def aggregate_values(values) | |||
if @values_aggregation_mode == SUM | |||
values.inject { |sum, element| sum + element } | |||
elsif @values_aggregation_mode == MAX | |||
values.max | |||
values.max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was probably an accident, will resolve once we have an agreement this is a valid solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the pretty late response! 😅
I really like this idea!
My thoughts in terms of the "too many files" problem has always been to not have "one file per metric", as that's really problematic when you have too many metrics (and you'll probably have more metrics than refreshed processes), and just have "one per process".
This probably won't solve the issues with too many metrics, but it should certainly help, and both approaches can be combined.
The main question I have is why do we need the regex?
Could we always just take process_title
? (which already returns pid
if empty)
My other question is whether you know if Unicorn also names its processes in this way, or if this would only work with Puma.
def process_title | ||
cmdline = "/proc/#{process_id}/cmdline" | ||
if File.exist?(cmdline) | ||
@process_title ||= File.read(cmdline).unpack("Z*").first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this memoization be higher up, so it caches the whole thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, absolutely, to get rid of file existence check.
end | ||
|
||
def filename_suffix | ||
if Thread.current[:prometheus_suffix] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often do we call this method? Do we need this Thread.current cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can confirm it is safe to only use memorization, then yeah, that will be cleaner. I am worried about this method being called on every measurement causing the regexp to recompile all the time.
Oh, and one more... Have you run this code in production for a while? :D |
Yeah, we have way too many and we have limited them over the past years to just about two hundreds. But due to dynamic nature of our Prometheus implementation (metrics appear as more and more users hit various actions, controllers or model classes) reloading of processes is not the culprit.
Nah, labs only :) Luckily this is opt-in, will have no effect until you set the regexp attribute.
Well, I don't think this would be safe to have as the default - what if a webserver (or anything that utilizes the library) simply names multiple processes with the same name? I have seen this in the past I think, something like "XYZ worker" without an unique numbering scheme. Then we would have a conflict and a nasty bug, for this reason I think this needs to be opt-in and users must think when implementing this.
I haven't found |
Signed-off-by: Lukas Zapletal <[email protected]>
f69454b
to
4ea5b2a
Compare
I have fixed both memorizations. |
Hi @lzap, First off, sorry we left you hanging for so long on this one. @dmagliola and I had a couple of chats about this PR in the meantime, and we think there's real value in giving consistent identity to multi-process workers that periodically get recycled/respawned. The reduction in how many metric files we generate could be pretty huge for some users. What we want to make sure of is getting the API right. In particular, we're keen to generalise it beyond worker systems that set process titles. While Puma exposes process identity that way, other multi-process worker systems may not, and we don't want to tie our API here to what Puma has chosen to do. We were thinking that it may be better to pass a lambda in rather than a regex. That way, different implementations for retrieving the current process's identity could be supplied. The Puma one could look at the process title, but others could source that information elsewhere. We would aim to ship lambdas for any popular framework as constants that users could pass in when configuring Before shipping this feature, we'd like to try and make it work with at least Puma and Unicorn (assuming the latter has some way to get a consistent identity for a worker process across respawns) as they're a couple of the most popular frameworks using multi-process workers. Outside of web servers, there's also multi-process support in Sidekiq Enterprise, which would be nice to target, but would likely involve a conversation with Mike Perham as it's only available in the most expensive paid edition of Sidekiq (we'd need to figure something out for testing). |
Hey, I want to say that we haven't deployed this in production and I am not very confident that is actually solves our problem which is that we have too many metrics (multiple metrics per each Rails controller/action and we have many). One big drawback is that Puma does not recycle processes, there is the https://github.com/zombocom/puma_worker_killer 3rd party gem but without it I think this would not be useful. Therefore I do not propose this into the mainline branch. Also I don't have time to extensively work on this. Feel free to pick it up if you find this useful. Thanks! |
I solved the long-term problem that is bugging us for years: when app server (Puma in our case) performs any kind of rolling restart or recycles dead worker, new data file is created causing the temporary directory for DirectFileStore to grow causing the
/metrics
endpoint to be slower and slower.I figured out how to solve this for Puma, and for other web servers which set process titles in a consistent way. Initially, I had the similar implementation for
Puma.stats
(https://github.com/puma/puma/blob/master/docs/stats.md), however, this did not work because this call returns relevant data only when called from the master process. Therefore, the implementation was done via process titles, which is also more generic and pretty fast.The patch includes a README explanation, start with that to understand what is going on.
Edit: Fixed tests.