Gunicorn/multiprocess with restarting workers, .db files are not cleaned up #275

Dominick-Peluso-Bose · 2018-05-21T13:57:54Z

We are using Gunicorn with multiple workers with the Gunicorn max_requests option. We found that memory usage in our containers increases with every worker restart. By reducing the max_requests value so workers restart very frequently, the problem became apparent.

Using the method described in the readme:

from prometheus_client import multiprocess

def child_exit(server, worker):
    multiprocess.mark_process_dead(worker.pid)

While running a performance test, I see the following in the container:

[root@mock-adapter-8568fdcf6d-lz8tx app]# cd /app/prometheus_tmp
[root@mock-adapter-8568fdcf6d-lz8tx prometheus_tmp]# ls
counter_10.db   counter_214.db  counter_319.db  counter_41.db   counter_91.db     histogram_196.db  histogram_296.db  histogram_401.db  histogram_69.db
counter_111.db  counter_222.db  counter_31.db   counter_425.db  counter_97.db     histogram_1.db    histogram_305.db  histogram_409.db  histogram_77.db
counter_117.db  counter_232.db  counter_328.db  counter_433.db  counter_9.db      histogram_202.db  histogram_311.db  histogram_419.db  histogram_84.db
counter_130.db  counter_23.db   counter_338.db  counter_444.db  histogram_10.db   histogram_214.db  histogram_319.db  histogram_41.db   histogram_91.db
counter_138.db  counter_244.db  counter_345.db  counter_450.db  histogram_111.db  histogram_222.db  histogram_31.db   histogram_425.db  histogram_97.db
counter_148.db  counter_251.db  counter_353.db  counter_458.db  histogram_117.db  histogram_232.db  histogram_328.db  histogram_433.db  histogram_9.db
counter_154.db  counter_258.db  counter_361.db  counter_470.db  histogram_130.db  histogram_23.db   histogram_338.db  histogram_444.db
counter_175.db  counter_269.db  counter_36.db   counter_48.db   histogram_138.db  histogram_244.db  histogram_345.db  histogram_450.db
counter_187.db  counter_277.db  counter_374.db  counter_56.db   histogram_148.db  histogram_251.db  histogram_353.db  histogram_458.db
counter_18.db   counter_286.db  counter_384.db  counter_61.db   histogram_154.db  histogram_258.db  histogram_361.db  histogram_470.db
counter_196.db  counter_296.db  counter_401.db  counter_69.db   histogram_175.db  histogram_269.db  histogram_36.db   histogram_48.db
counter_1.db    counter_305.db  counter_409.db  counter_77.db   histogram_187.db  histogram_277.db  histogram_374.db  histogram_56.db
counter_202.db  counter_311.db  counter_419.db  counter_84.db   histogram_18.db   histogram_286.db  histogram_384.db  histogram_61.db

Watching that directory, new files are created with every new worker and old ones are not being cleaned up. Deleting the files in that directory reduce down the memory usage on that container, which then starts building again.

What is the proper way of dealing with these database files? Is this mark_process_dead's responsibility?

And a general question about Prometheus: do we need to keep metrics around after they are collected? Could we wipe our metrics after the collector hits our metrics endpoint? If so, can this be done via the prometheus_client?

The text was updated successfully, but these errors were encountered:

brian-brazil · 2018-05-21T14:03:35Z

This is the expected behaviour, you can only delete these safely when the whole application restarts. I'd suggest increasing max_requests.

Dominick-Peluso-Bose · 2018-05-21T14:26:15Z

I'd suggest increasing max_requests.

We are using max_requests to deal with some memory leaks introduced with Gunicorn with gevent workers. Not ideal at all.

But even if we didn't have this problem, workers can restart for other reasons. Is it also expected that memory usage would increase like this? Because if so, it seems like eventually over the lifetime of the service, at some point all services using this pattern would run out of memory.

brian-brazil · 2018-05-21T14:39:32Z

Memory usage for workers should be constant.

mpitt · 2018-07-27T09:54:17Z

We have the same issue in our uWSGI deployment. We also restart workers every N requests, and they could restart by themselves because of harakiri (if the request takes too long). I don't think we could just keep the same workers forever, some sneaky memory leak might persist and max-requests is widely used to mitigate it.

Is it at all possible that the prometheus client would support this use case? Or should we just look for another solution?

brian-brazil · 2018-07-27T10:08:13Z

There's not much we can do here if you have high churn I'm afraid.

Dominick-Peluso-Bose · 2018-07-27T12:46:55Z

We disabled worker restarts and we found that using Gunicorn and Flask, once in a while, we do have unplanned worker restarts infrequently. So for us, we were able to reduce the large memory leak and now have a very slow one that grows over the course of months.

jkemp101 · 2018-07-31T11:45:51Z

Couldn't mark_process_dead() summarize the data into a single "dead" file so that there wasn't a need to leave a bunch of files hanging around? Are the values in each individual file needed after the process dies?

brian-brazil · 2018-07-31T11:53:37Z

That'd require locking across processes, which we want to avoid, and cause issues if that pid was reused.

jkemp101 · 2018-07-31T13:18:47Z

I think locking across processes in mark_process_dead() and/or collect() would probably be acceptable as long as it doesn't require any locking for "active" processes to record their samples. But of course this would get tricky.

I guess reusing pids is another issue if it mixes samples in the same pid file when it should actually be a true unique file per process. That would cause potential loss of data like a min value for a Gauge.

bloodearnest · 2018-08-24T09:51:18Z

Hey folks.

We have a workaround for this currently which uses gunicorn's child_exit hook to merge the dead worker's counter/histogram metrics into archive files and deletes the pid file. This has kept our file count and thus scrape response times stable. The attached image shows our scrape times (yellow line) when we deployed our workaround.

Our thinking was that using child_exit avoids the need to lock, as it's run only in the master. We still call the current mark_process_dead() function too, so gauge files are removed (although we are not heavy users of gauges).

Regards pid recycling, I think it is only an issue for gauges, which are still deleted anyway in our experiment. Also, given filenames with the pid in should be deleted very soon after the worker death, the chance of a recycling pid clash is very remote? I may have missed something though...

It seems to be working for us, but I would appreciate any feedback on anything we may have missed.

Our experiment involves a slightly altered version of the MultiProcessCollector.collect() function, which you can see here: https://github.com/canonical-ols/talisker/blob/master/talisker/metrics.py#L193

The changes are documented there, but for convenience are copied here:

This almost verbatim from MultiProcessCollector.collect().

The original collects all results in a format designed to be scraped. We
instead need to collect limited results, in a format that can be written
back to disk. To facilitate this, this version of collect() preserves label
ordering, and does not aggregate the histograms.

Specifically, it differs from the original:

1. it takes its files as an argument, rather than hardcoding '*.db'
2. it does not accumulate histograms
3. it preserves label order, to facilitate being inserted back into an mmap file.

It needs to be kept up to date with changes to prometheus_client as much as
possible, or until changes are landed upstream to allow reuse of collect().

We'd like to create a PR that upstreams some of this work, if you think the approach is sensible? Specifically, we'd like to slightly refactor the collect() method to support preserving label order, allow for not accumulating histograms and not hardcoding '*.db'. This would allow us to reuse it, rather than reimplement it.

Also, if this approach is deemed worthwhile, then we'd be happy to work on a PR to expand to include this aggregation as appropriate.

Thanks for reading!

EDIT: typo

brian-brazil · 2018-08-24T11:36:54Z

Our thinking was that using child_exit avoids the need to lock, as it's run only in the master.

That's not a good assumption. For example if another worker is started up and gets the old pid, that'll break things. We also need this to work on things that aren't Gunicorn.

For this to work, you'd need a lock of some form.

refactor the collect() method to support preserving label order

Does it not already? That's usually something we have.

allow for not accumulating histograms

That violates the Prometheus text format, histograms must be cumulative.

jkemp101 · 2018-08-24T12:36:06Z

Isn't the current approach already broken if the new worker gets the same PID? How does it currently keep the values for the old and new processes if the PID is the same? Won't the new process start overwriting the old process's values since it uses the same file. Just trying to understand this better to figure out potential workarounds.

brian-brazil · 2018-08-24T12:37:21Z

Isn't the current approach already broken if the new worker gets the same PID?

No, it's designed to handle it. It reads in the previous values at startup.

bloodearnest · 2018-08-24T13:23:18Z

Thanks for the prompt response! :)

That's not a good assumption. For example if another worker is started up and gets the old pid, that'll break things.

Right, I think I see the issue. Because the aggregation is not atomic, then there is a small window where a new worker using an old pid can lose data it may have written to pre-exisiting pid file, but after it was read by collect().

Strawman naive solution: add process creation timestamp to the filename? I think that would ensure that they are never reused, regardless of pid recycling? While clever, I suspect the current pid-recylcing solution of file reuse is likely to complicate this problem.

We also need this to work on things that aren't Gunicorn.

OK. But a general purpose multi-process serialization is hard, as you have stated. Using SIGCHLD to the master process is one way - and that is what child_exit boils down to. Would it perhaps be beneficial to offer an opt-in documented solution for gunicorn, than no solution at all?

For this to work, you'd need a lock of some form.

So, if you're talking about a lock in the master, we could add that. But for gunicorn/child_exit (and possibly python in general) I think that it's not strictly necessary, as child_exit is fired as a SIGCHILD signal handler, and the python execution of signal handlers is effectively serialized by the interpreter, AIUI.

Does it not already? That's usually something we have.

Sadly not - the wrapping them in dict() is what loses the order:

https://github.com/prometheus/client_python/blob/master/prometheus_client/multiprocess.py#L93

They are ordered all the way up to that final step.

My implementation simply used OrderedDict here, but I suspect a cleaner approach would be to avoid the dict() at all, and just return a list of tuples?

If that alone could be changed, then our solution would be improved, as we can use collect() directly, and just copy files into a different dir to get round the '*.db' beforehand, and unaccumulate histograms manually afterwards (this is the approach we took at first, until we hit the ordering issue).

That violates the Prometheus text format, histograms must be cumulative.

Right, sorry, was a bit vague there in hindsight. The collect() function would retain its current output, with accumulated histograms. But we'd refactor to allow it not to be accumlated, probably would propose a refactoring into a new function what would not accumulate, and then collect() would call that, and then accumulate. We could then just use the new function rather than collect().

The changes are not big, and I think it would help discussion, so I might take some time today to knock a PoC PR.

bloodearnest · 2018-08-24T13:23:52Z

Ah, I realised I didn't actually attach the screen shot earlier. This show scrape times, the yellow line it after our workaround was applied.

brian-brazil · 2018-08-24T13:36:51Z

Strawman naive solution: add process creation timestamp to the filename? I think that would ensure that they are never reused, regardless of pid recycling?

That could make the number of files worse. Not everyone can mark things dead.

Using SIGCHLD to the master process is one way - and that is what child_exit boils down to. Would it perhaps be beneficial to offer an opt-in documented solution for gunicorn, than no solution at all?

We need to work on Windows, and we shouldn't interfere with the signal handling of the process we're instrumenting.

Sadly not - the wrapping them in dict() is what loses the order:

Ah yes. That can't really be changed, as that's part of our API. Why do you want to change this?

bloodearnest · 2018-08-24T14:23:06Z

That could make the number of files worse. Not everyone can mark things dead.

It should only increase the number of files if recycled pids happen to be another worker. And our aggregation would delete them...

We need to work on Windows, and we shouldn't interfere with the signal handling of the process we're instrumenting

Re windows, I suspect we there is no single solution that would work for both, anyway?

Regards interfering, it would be up to the application to ensure the cleanup is called at the right time, just like it is currently?

Ah yes. That can't really be changed, as that's part of our API. Why do you want to change this?

The json key used for the mmap file format has the in label order. To write the collect()ed data back to the aggregated archive files, I need the same order, which sadly the dict() loses.

brian-brazil · 2018-08-24T14:27:27Z

Re windows, I suspect we there is no single solution that would work for both, anyway?

I don't know offhand.

Regards interfering, it would be up to the application to ensure the cleanup is called at the right time, just like it is currently?

Yes. But there's no concurrency restrictions around that, so for example two different processes could be performing it at once.

The json key used for the mmap file format has the in label order. To write the collect()ed data back to the aggregated archive files, I need the same order, which sadly the dict() loses.

The file format can be changed.

groner · 2018-08-24T18:22:30Z

I'm also facing rising scrape times with a gunicorn application that has restarting workers. I have a PoC that I worked up before finding this discussion. I can share that if it would be helpful, but I don't want to step on any toes.

I've spent some time thinking about this, and I'd like to see it solved, so I hope you don't mind if I add my thoughts to the discussion.

Regarding timestamped filenames making the problem worse, this would only be a problem when pid reuse actually occurs. I think the users who are likely to be affected will probably have thousands of db files before this happens and scrape times of ten minutes or longer. I'm not saying it's impossible, but it must be vanishingly rare.

If the multiprocess db file can be locked by the writer (when it is opened), then mark_process_dead can test to see if the file is still active by trying to obtain the lock. This could partially work around the issue, but it leaves open the possibility of the livesum/liveall gauge files inheriting values from a dead process.

bloodearnest · 2018-08-29T10:55:21Z

I don't know offhand.

Looking around, multiprocessing.Lock() might provide a cross-platform solution. I've added it to solve an issue in our solution, seems to work, but it's not yet in production, so unproven.

Yes. But there's no concurrency restrictions around that, so for example two different processes could be performing it at once.

Right, I wonder if documenting what the synchronization requirements are might be a good start? Then applications can use the right approach given their concurrency model?

The file format can be changed.

Right, fair enough. But I don't think it needs to, we just need to avoid losing the order of labels, which seems fairly simple?

brian-brazil · 2018-08-29T11:07:13Z

Looking around, multiprocessing.Lock() might provide a cross-platform solution.

Hmm, how does that work behind the scenes?

Right, I wonder if documenting what the synchronization requirements are might be a good start? Then applications can use the right approach given their concurrency model?

I think this is something that should Just Work in any environment, I rather not have a system that requires extensive documentation and reasoning to work - as users will get it wrong.

bloodearnest · 2018-08-29T11:35:31Z

Hmm, how does that work behind the scenes?

From the source, it looks like it's using WaitForSingleObjectEx syscall in windows, and a POSIX semaphore on Unix.

I think this is something that should Just Work in any environment, I rather not have a system that requires extensive documentation and reasoning to work - as users will get it wrong.

Fair enough, it's just a lot more work to get there :).

If multiprocessing.Lock is suitable, then I think mark_process_dead could maybe be expanded to aggregate into a single file on worker death by default, using that lock?

Rough stab at general solution: register an atexit function that calls mark_process_dead. That should work for any multiprocess scenario, with the caveat that processes forcibly killed will not run the atexit handle, and thus leak mmapped files. But that's a common enough issue with killing processes that way, and the leak would be limited to rare error scenarios, rather than every process recycle.

brian-brazil · 2018-08-29T11:45:16Z

How is the handle/semaphone id shared between the processes?

I think mark_process_dead could maybe be expanded to aggregate into a single file on worker death by default, using that lock?

That only works if you know that no new processes will be created with that pid while you're doing that.

bloodearnest · 2018-08-29T13:49:41Z

How is the handle/semaphone id shared between the processes?

The lock has to be inherited, so it has to be created in the parent process, and thus only works for a fork()ing model.

That only works if you know that no new processes will be created with that pid while you're doing that.

Right, that could probably be addressed by locking the mmap file opening with the same lock, or as discussed earlier, adding a timestamp or similar to the mmap filename (if we have full worker death cleanup, then extra files are no longer a problem, anyway).

brian-brazil · 2018-08-29T13:54:09Z

The lock has to be inherited, so it has to be created in the parent process, and thus only works for a fork()ing model.

More particularly it only works if the parent process loads the Prometheus client library, which certain configurations of Gunicorn will not.

bloodearnest · 2018-08-30T09:31:39Z

More particularly it only works if the parent process loads the Prometheus client library, which certain configurations of Gunicorn will not.

Right. You would probably need to document adding a gunicorn on_starting hook or something that did that.

brian-brazil · 2018-08-30T10:19:38Z

Gunicorn isn't the only thing in use though, what if it's say mod_python in Apache (which I admit to having never used, and haven't even read the docs) in which a C process is what spawns the children.

bloodearnest · 2018-09-05T11:58:16Z

Made a PR with the small changes to MultiProcessCollector.collect() I mentioned above.

This doesn't solve the problem, but it makes it easier for others to roll their own solution to the problem, as we have done (we had to duplicate the whole of collect() in our code - this would allow us to just call the new merge() function directly)

Factors out a merge() method from the previous collect() method, which is parameterized, and thus can be used for arbitrary merging of samples. For motivation, see discussion in issue prometheus#275 around merging dead workers data into a single mmaped file. This basically allows us to parameterize the files to be merged, and also whether to accumulate histograms or not. Accumulation is on by default, as that is what the prometheus format expects. But it can now be disabled, which allows merged values to be correctly written back to an mmaped file. For the same reason, the order of labels is preserved via OrderedDict. Signed-off-by: Simon Davy <[email protected]>

…ng. (#302) * Refactor MultiprocessCollector.collect to allow for arbitrary merging. Factors out a merge() method from the previous collect() method, which is parameterized, and thus can be used for arbitrary merging of samples. For motivation, see discussion in issue #275 around merging dead workers data into a single mmaped file. This basically allows us to parameterize the files to be merged, and also whether to accumulate histograms or not. Accumulation is on by default, as that is what the prometheus format expects. But it can now be disabled, which allows merged values to be correctly written back to an mmaped file. Signed-off-by: Simon Davy <[email protected]>

arthurprs · 2018-12-07T10:50:14Z

Did anybody manage to solve this? I'm using uwsgi and it's not obvious how I can call mark_process_dead

bloodearnest · 2018-12-07T13:37:17Z

We managed to solve this for Gunicorn, the bulk of our (rather complex) solution can be seen here:

https://github.com/canonical-ols/talisker/blob/master/talisker/prometheus.py

It's not pretty :(

The core of the solution is this: The master process runs a cleanup function for each worker that dies. This function aggregates the dead worker's files into some central files, and then deletes the old files. This write and delete is a critical section that needs protecting with a multiprocess lock of some kind, or else workers reading the files could double read.

The Gunicorn specifics are using a SIGCHLD handler on the master process, and other messy details, but it's been working well for us, after a few initial bumps.

However, that solution is very specific to gunicorn's multiprocess model. uWSGI is a whole other kettle of fish. AFAICS, it doesn't support hooking in to SIGCHLD handling on the master, which is in C, so not so easily extensible. One option might be to define a custom worker as documented here:

https://uwsgi-docs.readthedocs.io/en/latest/WorkerOverride.html

You can maybe do the work in the destroy() function, which I assume happens in the worker process, but you'd really need multiprocess locking here. We used the stdlib's multiprocess.Lock(), but as discussed in this thread, it requires a python master process to initialise it. But under the hood, it's just a semaphore based on a shared memory file path, some form of that might be usable, if you can figured out how to ensure each worker is using the same path.

Are you aware of https://github.com/Lispython/pyprometheus ? That is an alternative prometheus client library that has specific support for using uwsgi's shared memory support, as opposed to using mmap files in a shared directory. I have not used it, and it's not the official prometheus client, but it might help here - you may be able to use it's approach with prometheus_client.

HTH

Edit: did some more research, and sharedarea in uswsgi is just a locking wrapper around a shared mmap, but it does to the multiprocess locking for you, which is nice.

akx · 2019-01-21T19:42:07Z

It should be noted that you only need mark_process_dead if you use livesum or liveall gauge multiprocess types, as that's all it cleans up.

client_python/prometheus_client/multiprocess.py

Lines 120 to 127 in 4aa1936

    
           def mark_process_dead(pid, path=None): 
        
               """Do bookkeeping for when one process dies in a multi-process setup.""" 
        
               if path is None: 
        
                   path = os.environ.get('prometheus_multiproc_dir') 
        
               for f in glob.glob(os.path.join(path, 'gauge_livesum_{0}.db'.format(pid))): 
        
                   os.remove(f) 
        
               for f in glob.glob(os.path.join(path, 'gauge_liveall_{0}.db'.format(pid))): 
        
                   os.remove(f)

ados1991 · 2019-04-10T18:15:02Z

Have you found any solution for this issue ?

xuansontrinh · 2019-04-16T02:46:08Z

hello @bloodearnest , I have attempted to use your solution to tackle db file number problem. I used the solution on three metrics types: Counter (counter_{pid}.db), Histogram (histogram_{pid}.db), and Summary (summary_{pid}.db). I tried decreasing the max_requests config in Gunicorn and ran some stress tests and the result is promising, the number of files are kept at minimum.
However, I have some questions about the procedure:

I occasionally got the error RuntimeError: Read beyond file size detected, /opt/prometheus/summary_archive.db is corrupted. and that made the db files not merged correctly, leaving some of dead pid's summary db files behind. The error has not happened to counter_archive.db and histogram_archive.db yet from my observation. Is that the reason you did not archive the summary db files in your code?
Assuming the db files are all merged well, will there be a day when {metrics_type}_archive.db grows too large and causes problem?
Will this procedure slow down and/or block the gunicorn master at some points? I am concerned about the impact of this when it comes to production.

Thank you for your time.

Edit: Below is the image describing the 1st question. My gunicorn is configured to run with 3 workers:

tino · 2019-06-23T13:59:52Z

I'm using

if "prometheus_multiproc_dir" in os.environ:
    import uwsgi

    prometheus_client.values.ValueClass = prometheus_client.values.MultiProcessValue(
        uwsgi.worker_id
    )

which makes it so that it doesn't use the pid but the worker id, which is reused on worker restart.

This looks like it works in dev. @brian-brazil would there be problems with this setup?

The only thing that looks suspicious to me is not having a _created gauge for counters. Is that to be expected with multiprocess setup?

brian-brazil · 2019-11-14T12:04:45Z

This issue has gotten a bit long covering many related issues. The initial report was not a bug, so let's open new issues for any new discussion around improvements here.

bloodearnest · 2019-11-29T10:03:02Z

Hi, sorry, I know this issue is now closed, but I wanted to reply to @xuansontrinh, I'd missed the question addressed to me.

hello @bloodearnest , I have attempted to use your solution to tackle db file number problem. I used the solution on three metrics types: Counter (counter_{pid}.db), Histogram (histogram_{pid}.db), and Summary (summary_{pid}.db). I tried decreasing the max_requests config in Gunicorn and ran some stress tests and the result is promising, the number of files are kept at minimum.
However, I have some questions about the procedure:

I occasionally got the error RuntimeError: Read beyond file size detected, /opt/prometheus/summary_archive.db is corrupted. and that made the db files not merged correctly, leaving some of dead pid's summary db files behind. The error has not happened to counter_archive.db and histogram_archive.db yet from my observation. Is that the reason you did not archive the summary db files in your code?

No - the reason is that we don't use summaries! I expect you'll need to extend the code to do so. If the summaries are a different format from counters/histograms, then you may need to write code to merge them.

Assuming the db files are all merged well, will there be a day when {metrics_type}_archive.db grows too large and causes problem?

I don't think this should be an issue. I think the file size should depend on the number of metrics/labels your app has, not the length of time it's been around, as the merge should collate all of them into one entry per metric/label combination.

Also, we avoid this by using a new tmpdir for prometheus_multiproc_dir when the gunicorn service restarts, to meet the requirement of clearing it between service starts as in the documentation. And as each deploy is a new service, it will only grow for the lifetime of a deployment, which is typically only a few weeks at most.

Will this procedure slow down and/or block the gunicorn master at some points? I am concerned about the impact of this when it comes to production.

The gunicorn master does do more work. But it spends most of its time idle anyway, as it handles no traffic. I guess if you have many workers, there might be an issue, especially if you SIGHUP gunicorn, as all the current workers will die. FTR, we use 64 workers, and regularly SIGHUP, with out any issue that we've detected.

Thank you for your time.

Edit: Below is the image describing the 1st question. My gunicorn is configured to run with 3 workers:

ya-pekatoros · 2024-08-14T16:28:42Z

Hi everyone! Sorry for updating it after two years but I will don't understand what should I do with flooding prometheus multiproc dir with .db files? It slows the /metrics collection. Now we just clean up the dir for each deploy but if where are not deploys for some days it leads to timeouts for /metrics request.

What is the best practice here? Cron cleaning job?

rsamadzade · 2024-09-04T13:33:04Z

Hi All. I am also facing the same issue. Memory will reach the limit and application will restart each time. I tried to use all types of workers but still no success. I tried to run a cron job to delete files but after each deletion I will get a pod error for prometheus. Also if a file was being used by some worker metrics endpoint would give an exception and again error will be registered via prometheus.

For the suggestion above to restart the server is not suitable for me as I need it to run without any restart and would not want application depend of metrics. Any help is appreciated. Thanks!

guerremdq mentioned this issue May 28, 2018

Delete histogram and counter when process die too. #278

Closed

Doooooo0o mentioned this issue Jun 13, 2018

Abusive memory usage after high traffic multiprocess gunicorn #283

Closed

bloodearnest mentioned this issue Aug 29, 2018

Data race with new prometheus aggregation canonical-ols/talisker#297

Closed

bloodearnest mentioned this issue Sep 5, 2018

Refactor MultiProcessCollector.collect() to allow for arbitrary merging. #302

Merged

brian-brazil closed this as completed Nov 14, 2019

tyler-8 mentioned this issue Dec 19, 2019

Add configuration to prevent inaccurate prometheus metrics netbox-community/netbox#3779

Closed

onzo-mateuszzakarczemny mentioned this issue Jul 24, 2020

High cpu when using Gunicorn with multiprocess #568

Open

Lantero mentioned this issue Aug 12, 2022

Multiprocess design choices (Analysis and discussion) Llandy3d/pytheus#3

Closed

Gunicorn/multiprocess with restarting workers, .db files are not cleaned up #275

Gunicorn/multiprocess with restarting workers, .db files are not cleaned up #275

Comments

Dominick-Peluso-Bose commented May 21, 2018

brian-brazil commented May 21, 2018

Dominick-Peluso-Bose commented May 21, 2018

brian-brazil commented May 21, 2018

mpitt commented Jul 27, 2018

brian-brazil commented Jul 27, 2018

Dominick-Peluso-Bose commented Jul 27, 2018

jkemp101 commented Jul 31, 2018

brian-brazil commented Jul 31, 2018

jkemp101 commented Jul 31, 2018

bloodearnest commented Aug 24, 2018 • edited Loading

brian-brazil commented Aug 24, 2018

jkemp101 commented Aug 24, 2018

brian-brazil commented Aug 24, 2018

bloodearnest commented Aug 24, 2018

bloodearnest commented Aug 24, 2018 • edited Loading

brian-brazil commented Aug 24, 2018

bloodearnest commented Aug 24, 2018

brian-brazil commented Aug 24, 2018

groner commented Aug 24, 2018

bloodearnest commented Aug 29, 2018

brian-brazil commented Aug 29, 2018

bloodearnest commented Aug 29, 2018

brian-brazil commented Aug 29, 2018

bloodearnest commented Aug 29, 2018

brian-brazil commented Aug 29, 2018

bloodearnest commented Aug 30, 2018

brian-brazil commented Aug 30, 2018

bloodearnest commented Sep 5, 2018

arthurprs commented Dec 7, 2018

bloodearnest commented Dec 7, 2018 • edited Loading

akx commented Jan 21, 2019

ados1991 commented Apr 10, 2019

xuansontrinh commented Apr 16, 2019 • edited Loading

tino commented Jun 23, 2019

brian-brazil commented Nov 14, 2019

bloodearnest commented Nov 29, 2019

ya-pekatoros commented Aug 14, 2024

rsamadzade commented Sep 4, 2024

bloodearnest commented Aug 24, 2018 •

edited

Loading

bloodearnest commented Aug 24, 2018 •

edited

Loading

bloodearnest commented Dec 7, 2018 •

edited

Loading

xuansontrinh commented Apr 16, 2019 •

edited

Loading