Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Freeze GC before forking Gunicorn workers #21474

Merged
merged 7 commits into from
Jun 24, 2023
Merged

Conversation

ankush
Copy link
Member

@ankush ankush commented Jun 24, 2023

Both Gunicorn and RQ use forking to spawn workers. In an ideal world, the fork should be sharing most of the memory if there are no writes made to data because of Copy on Write, however, python's GC is not CoW friendly and writes to data even if user-code doesn't. Specifically, the generational GC which stores and mutates every python object: PyGC_Head

This can be avoided by doing gc.freeze() pre-fork to freeze all current objects and move them permanently to the last generation of generational gc.

Reference:

This is behind an environment variable for now, enable by setting FRAPPE_TUNE_GC to any truthy value.

Preliminary numbers

Metric before after reduction
24 Worker total PSS 1791.518 1236.89 31%
1 RQ Worker + 1 Horse PSS 76.2 68.4 10%

PS: measuring benefit of this is kinda hard as memory usage depends on usage, without freeze most shared memory would become unshared over time.

TODO:

  • Put it behind some environmental variable for now (?)
  • Apply the same logic to background jobs. Not possible, we right now create duplicate processes instead of forking from one master process. RQ is inefficient for this, though something like workerpool or https://github.com/ankush/rq_orchestrator/ can solve this.
  • Tweak gc params. We reload entire process after some requests, no need to aggressively GC considering that.
  • Check behaviour with reloaders (or just disable in dev mode)
  • test with autorestart
  • Run in prod and gather some data on memory usage over time.
    • review what should be imported pre-fork, since sharing is now beneficial.

Should close #18927

Continues #21473, #21467

@github-actions github-actions bot added the add-test-cases Add test case to validate fix or enhancement label Jun 24, 2023
@ankush ankush removed the add-test-cases Add test case to validate fix or enhancement label Jun 24, 2023
@codecov
Copy link

codecov bot commented Jun 24, 2023

Codecov Report

Merging #21474 (29d28a4) into develop (278b6fc) will increase coverage by 0.00%.
The diff coverage is 68.42%.

❗ Current head 29d28a4 differs from pull request most recent head 150c36c. Consider uploading reports for the commit 150c36c to get more accurate results

Additional details and impacted files
@@           Coverage Diff            @@
##           develop   #21474   +/-   ##
========================================
  Coverage    63.85%   63.85%           
========================================
  Files          766      766           
  Lines        69436    69440    +4     
  Branches      6276     6276           
========================================
+ Hits         44337    44343    +6     
+ Misses       21514    21512    -2     
  Partials      3585     3585           
Flag Coverage Δ
server 68.82% <68.42%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

ankush added 6 commits June 24, 2023 15:23
This is disabled on 99%+ sites, still incurs 3-4MB of memory hit on import of
ldap3.
This reverts commit 71b44ef.

This gets frequently imported from one place or another. Since with
gc.freeze we can mostly reuse the import from parent, let's just leave
it here.
This has overall 1-2% CPU usage reduction for little to no costs.
Benefits increase when doing bulk processing with lots of objects.
BG worker forks are not CoW friendly. Freezing right before we start
worker should lessen overall memory usage. Though this isn't useful much
because at max you're sharing with 2 processes - master and horse.

WorkerPool can improve this benefit a lot by forking each worker from
master process and horse from forked processes. TBD when WorkerPool is
out of beta.
@ankush ankush marked this pull request as ready for review June 24, 2023 12:10
@ankush ankush requested a review from surajshetty3416 as a code owner June 24, 2023 12:10
@ankush ankush requested review from a team and phot0n and removed request for a team June 24, 2023 12:10
@ankush ankush merged commit dc7620f into frappe:develop Jun 24, 2023
@ankush ankush deleted the gc_freeze branch June 24, 2023 12:12
@ankush ankush restored the gc_freeze branch July 2, 2023 11:09
@ankush ankush deleted the gc_freeze branch July 9, 2023 15:30
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce memory usage of workers
1 participant