-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support running superset via pex #1713
Conversation
@@ -46,6 +46,9 @@ def runserver(debug, address, port, timeout, workers): | |||
threaded=True, | |||
debug=True) | |||
else: | |||
print("DEPRECATED: Please run gunicorn via the 'runserver_gunicorn' command") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the two commands? What does change if we switch to the new way of loading gunicorn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing should change from the user perspective when switching to the new command. I'm keeping the old one in there for now for backwards compatibility. At least inside Airbnb, it's hard for us to simultaneously switch both the app code and the command used to run the app (set via chef).
Once this change is in, I can come back in a few days and delete the old gunicorn command.
Let me know if this doesn't sound ok.
subparsers = parser.add_subparsers() | ||
gunicorn_parser = subparsers.add_parser('runserver_gunicorn') | ||
|
||
gunicorn_parser.add_argument('-a', '--address', type=str, default='0.0.0.0') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about making 127.0.0.1 as a default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -46,6 +46,9 @@ def runserver(debug, address, port, timeout, workers): | |||
threaded=True, | |||
debug=True) | |||
else: | |||
print("DEPRECATED: Please run gunicorn via the 'runserver_gunicorn' command") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codeclimate complains: print("DEPRECATED: Please run gunicorn via the 'runserver_gunicorn' command")
LGTM, @mistercrunch - how does it look for you? You have way more experience with gunicorn and python in general. |
import gunicorn.app.base | ||
|
||
|
||
class GunicornSupersetApplication(gunicorn.app.base.BaseApplication): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cli logic should live in cli.py
as we're trying to keep this file to a bare minimum since it can't be imported easily (no .py
extension). You should also use flask_script
the same way that the rest of the CLI is written as opposed to good old argparse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that but got lots of DB errors due to forking problems. :-( The issue is that, with the way things are set up now, you can't import superset
or any of its subpackages before gunicorn forks (i.e., in the load
function). The underlying cause is that sqlalchemy (initialized in superset/__init__.py
) is not fork-safe.
We ran into similar issues when pex-ifying knowledge repo. The way around it is to refactor the code so that cli.py
can be loaded without initializing the flask app and its associated DB connections in the gunicorn
case. Looking at the code, this is doable, but would require a significant amount of refactoring.
Given these limitations, can we keep the gunicorn wrapper in the superset
script for now and leave the refactoring as a future TODO? Apologies that it isn't as clean as it would be ideally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry I now realize I skipped reading the PR's message which explained all this. We'll need to update our chef recipe when merging this into production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! The old version of runserver
will still work until we deprecate it, so nothing should break immediately.
return app | ||
|
||
|
||
def run_gunicorn(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to cli.py
, use flask_script
LGTM |
@bkyryliuk @mistercrunch Refactored the |
'-a', '--address', type=str, default='127.0.0.1', | ||
help='Specify the address to which to bind the web server') | ||
gunicorn_parser.add_argument( | ||
'-p', '--port', type=int, default=8088, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request has changed the default port number from config.get("SUPERSET_WEBSERVER_PORT")
to 8088
. Effectively the SUPERSET_WEBSERVER_ADDRESS
, SUPERSET_WEBSERVER_PORT
, SUPERSET_WORKERS
, and SUPERSET_WEBSERVER_TIMEOUT
config variables have been removed. (They have no effect now.)
Is this intentional? These settings are still mentioned in the docs.
+cc @szmate1618
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope it's not :) Care to open a PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope it's not :) Care to open a PR?
The solution is not easy it seems. The idea here appears to be to avoid importing superset
before the Gunicorn application is loaded. So at the point where the flag defaults are set we cannot access the Superset config file yet.
I think it may be fine to retire these settings. Gunicorn can be configured via a config file. Superset could have a separate Gunicorn config file instead that would allow setting these web-server-specific parameters in one logical place. Plus it would give the user a chance to set many other Gunicorn settings that cannot be controlled now, such as the SSL certificate.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't know, i kinda liked the one config file to rule them all approach. Let's ping the airbnb guys @yolken @mistercrunch
BTW could we please move the discussion to #1939 ?
We've found other ways of pex-ing for gunicorn to address this problem (gunicorn forking the process while sqlalchemy is not fork-safe), we'll revert most of this PR as soon as the alternative solution is up. |
to: @mistercrunch @bkyryliuk
Context
Pex is a packaging system and format for Python apps that combines the code for an app and its dependencies in a single file. This change includes some minor updates to superset to support running it via a pex. Everything should be backwards-compatible.
Content
cli.manager
and having some wrapper code insuperset/bin/superset
so that the app forks cleanly in this case