-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(celery): Reset DB connection pools for forked worker processes #13350
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov Report
@@ Coverage Diff @@
## master #13350 +/- ##
==========================================
- Coverage 77.12% 77.10% -0.02%
==========================================
Files 881 881
Lines 45502 45507 +5
Branches 5447 5449 +2
==========================================
- Hits 35093 35090 -3
- Misses 10286 10293 +7
- Partials 123 124 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! but does this happen even when using null pool on the workers? or do you think there's still uses of QueuePool create by flask-sqlalchemy at the worker level?
@dpgaspar the |
workers when accessing the metadata db should use: https://github.com/apache/superset/blob/master/superset/utils/celery.py#L33 Let's talk about this? |
Talked with @dpgaspar and we agree this is a good first step, and more DB connection pool investigation is required. Merging. |
…pache#13350) * Reset sqlalchemy connection pool on celery process fork * Fix race condition with async chart loading state * pylint: ignore * prettier
…pache#13350) * Reset sqlalchemy connection pool on celery process fork * Fix race condition with async chart loading state * pylint: ignore * prettier
SUMMARY
Adds a listener for the
worker_process_init
Celery signal that disposes of and resets the SQLAlchemy connection pool being passed to the forked process.Resolves the intermittent
sqlalchemy.exc.NoSuchColumnError
reported in #10530 and #12766 and the following error reported in #9860:This fix is primarily related to the default
prefork
Celery execution pool, but was also tested with the following pool invocations:This configuration was tested with async queries enabled to place load on the celery workers, in both standalone and Docker-based workflows.
This PR also includes a fix for a client-side race condition in loading charts asynchronously (fixes #12913).
References:
https://docs.sqlalchemy.org/en/13/core/connections.html#engine-disposal
https://www.yangster.ca/post/not-the-same-pre-fork-worker-model/
TEST PLAN
Asynchronous tasks should run without
sqlalchemy.exc.NoSuchColumnError
when celery is run inprefork
mode. See #10530 and #12766 for reproducability.ADDITIONAL INFORMATION