perf: Freeze GC before forking Gunicorn workers #21474

ankush · 2023-06-24T09:00:29Z

Both Gunicorn and RQ use forking to spawn workers. In an ideal world, the fork should be sharing most of the memory if there are no writes made to data because of Copy on Write, however, python's GC is not CoW friendly and writes to data even if user-code doesn't. Specifically, the generational GC which stores and mutates every python object: PyGC_Head

This can be avoided by doing gc.freeze() pre-fork to freeze all current objects and move them permanently to the last generation of generational gc.

Reference:

This is behind an environment variable for now, enable by setting FRAPPE_TUNE_GC to any truthy value.

Preliminary numbers

Metric	before	after	reduction
24 Worker total PSS	1791.518	1236.89	31%
1 RQ Worker + 1 Horse PSS	76.2	68.4	10%

PS: measuring benefit of this is kinda hard as memory usage depends on usage, without freeze most shared memory would become unshared over time.

TODO:

Put it behind some environmental variable for now (?)
~~Apply the same logic to background jobs.~~ Not possible, we right now create duplicate processes instead of forking from one master process. RQ is inefficient for this, though something like workerpool or https://github.com/ankush/rq_orchestrator/ can solve this.
Tweak gc params. We reload entire process after some requests, no need to aggressively GC considering that.
Check behaviour with reloaders (or just disable in dev mode)
test with autorestart
Run in prod and gather some data on memory usage over time.
- review what should be imported pre-fork, since sharing is now beneficial.

Should close #18927

Continues #21473, #21467

codecov · 2023-06-24T09:47:36Z

Codecov Report

Merging #21474 (29d28a4) into develop (278b6fc) will increase coverage by 0.00%.
The diff coverage is 68.42%.

❗ Current head 29d28a4 differs from pull request most recent head 150c36c. Consider uploading reports for the commit 150c36c to get more accurate results

Additional details and impacted files

@@           Coverage Diff            @@
##           develop   #21474   +/-   ##
========================================
  Coverage    63.85%   63.85%           
========================================
  Files          766      766           
  Lines        69436    69440    +4     
  Branches      6276     6276           
========================================
+ Hits         44337    44343    +6     
+ Misses       21514    21512    -2     
  Partials      3585     3585

Flag	Coverage Δ
server	`68.82% <68.42%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

This is disabled on 99%+ sites, still incurs 3-4MB of memory hit on import of ldap3.

This reverts commit 71b44ef. This gets frequently imported from one place or another. Since with gc.freeze we can mostly reuse the import from parent, let's just leave it here.

This has overall 1-2% CPU usage reduction for little to no costs. Benefits increase when doing bulk processing with lots of objects.

BG worker forks are not CoW friendly. Freezing right before we start worker should lessen overall memory usage. Though this isn't useful much because at max you're sharing with 2 processes - master and horse. WorkerPool can improve this benefit a lot by forking each worker from master process and horse from forked processes. TBD when WorkerPool is out of beta.

github-actions bot added the add-test-cases Add test case to validate fix or enhancement label Jun 24, 2023

ankush mentioned this pull request Jun 24, 2023

perf: lower web worker memory usage #21473

Merged

ankush force-pushed the gc_freeze branch from 8ff2f43 to b2e64cf Compare June 24, 2023 09:19

ankush removed the add-test-cases Add test case to validate fix or enhancement label Jun 24, 2023

ankush added 6 commits June 24, 2023 15:23

perf: Freeze GC before forking Gunicorn workers

423e781

fix: Only import LDAP if it's enabled

cc43af9

This is disabled on 99%+ sites, still incurs 3-4MB of memory hit on import of ldap3.

Revert "perf: defer many requests imports"

32bd5d3

This reverts commit 71b44ef. This gets frequently imported from one place or another. Since with gc.freeze we can mostly reuse the import from parent, let's just leave it here.

fix: move gc.freeze behind environ variable

4f0a2e6

perf: Bump alloc count to 7,000 for generation 0

fbe3174

This has overall 1-2% CPU usage reduction for little to no costs. Benefits increase when doing bulk processing with lots of objects.

ankush force-pushed the gc_freeze branch from b2e64cf to 29d28a4 Compare June 24, 2023 11:38

fix: collect before freezing

150c36c

ankush marked this pull request as ready for review June 24, 2023 12:10

ankush requested a review from surajshetty3416 as a code owner June 24, 2023 12:10

ankush requested review from a team and phot0n and removed request for a team June 24, 2023 12:10

ankush merged commit dc7620f into frappe:develop Jun 24, 2023

ankush deleted the gc_freeze branch June 24, 2023 12:12

This was referenced Jun 24, 2023

perf: pre-load common modules to reduce memory usage #21475

Merged

perf: significant memory usage reductions #21541

Merged

ankush restored the gc_freeze branch July 2, 2023 11:09

ankush deleted the gc_freeze branch July 9, 2023 15:30

github-actions bot locked as resolved and limited conversation to collaborators Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Freeze GC before forking Gunicorn workers #21474

perf: Freeze GC before forking Gunicorn workers #21474

ankush commented Jun 24, 2023 •

edited

Loading

codecov bot commented Jun 24, 2023 •

edited

Loading

perf: Freeze GC before forking Gunicorn workers #21474

perf: Freeze GC before forking Gunicorn workers #21474

Conversation

ankush commented Jun 24, 2023 • edited Loading

codecov bot commented Jun 24, 2023 • edited Loading

Codecov Report

ankush commented Jun 24, 2023 •

edited

Loading

codecov bot commented Jun 24, 2023 •

edited

Loading