Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support running superset via pex #1713

Merged
merged 10 commits into from
Dec 1, 2016
Merged

Support running superset via pex #1713

merged 10 commits into from
Dec 1, 2016

Conversation

yolken
Copy link
Contributor

@yolken yolken commented Nov 30, 2016

to: @mistercrunch @bkyryliuk

Context

Pex is a packaging system and format for Python apps that combines the code for an app and its dependencies in a single file. This change includes some minor updates to superset to support running it via a pex. Everything should be backwards-compatible.

Content

  • Support running gunicorn without going through gunicorn command-line; this requires bypassing the cli.manager and having some wrapper code in superset/bin/superset so that the app forks cleanly in this case
  • Support loading a local config that isn't in the python path. In the pex case, the pythonpath is isolated and immutable, so this is the easiest way to bring in an external config.

@@ -46,6 +46,9 @@ def runserver(debug, address, port, timeout, workers):
threaded=True,
debug=True)
else:
print("DEPRECATED: Please run gunicorn via the 'runserver_gunicorn' command")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the two commands? What does change if we switch to the new way of loading gunicorn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing should change from the user perspective when switching to the new command. I'm keeping the old one in there for now for backwards compatibility. At least inside Airbnb, it's hard for us to simultaneously switch both the app code and the command used to run the app (set via chef).

Once this change is in, I can come back in a few days and delete the old gunicorn command.

Let me know if this doesn't sound ok.

subparsers = parser.add_subparsers()
gunicorn_parser = subparsers.add_parser('runserver_gunicorn')

gunicorn_parser.add_argument('-a', '--address', type=str, default='0.0.0.0')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about making 127.0.0.1 as a default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -46,6 +46,9 @@ def runserver(debug, address, port, timeout, workers):
threaded=True,
debug=True)
else:
print("DEPRECATED: Please run gunicorn via the 'runserver_gunicorn' command")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codeclimate complains: print("DEPRECATED: Please run gunicorn via the 'runserver_gunicorn' command")

@bkyryliuk
Copy link
Member

bkyryliuk commented Nov 30, 2016

LGTM, @mistercrunch - how does it look for you? You have way more experience with gunicorn and python in general.

import gunicorn.app.base


class GunicornSupersetApplication(gunicorn.app.base.BaseApplication):
Copy link
Member

@mistercrunch mistercrunch Dec 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cli logic should live in cli.py as we're trying to keep this file to a bare minimum since it can't be imported easily (no .py extension). You should also use flask_script the same way that the rest of the CLI is written as opposed to good old argparse

Copy link
Contributor Author

@yolken yolken Dec 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that but got lots of DB errors due to forking problems. :-( The issue is that, with the way things are set up now, you can't import superset or any of its subpackages before gunicorn forks (i.e., in the load function). The underlying cause is that sqlalchemy (initialized in superset/__init__.py) is not fork-safe.

We ran into similar issues when pex-ifying knowledge repo. The way around it is to refactor the code so that cli.py can be loaded without initializing the flask app and its associated DB connections in the gunicorn case. Looking at the code, this is doable, but would require a significant amount of refactoring.

Given these limitations, can we keep the gunicorn wrapper in the superset script for now and leave the refactoring as a future TODO? Apologies that it isn't as clean as it would be ideally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry I now realize I skipped reading the PR's message which explained all this. We'll need to update our chef recipe when merging this into production.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! The old version of runserver will still work until we deprecate it, so nothing should break immediately.

return app


def run_gunicorn():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to cli.py, use flask_script

@mistercrunch
Copy link
Member

LGTM

@yolken
Copy link
Contributor Author

yolken commented Dec 1, 2016

@bkyryliuk @mistercrunch Refactored the runserver flow a bit as we discussed earlier today. In particular, both the debug and gunicorn cases are now handled consistently, so we won't have to deprecate anything in the future. Can you take another look? Thanks!

@mistercrunch
Copy link
Member

img

@mistercrunch mistercrunch merged commit 50da4f8 into master Dec 1, 2016
@mistercrunch mistercrunch deleted the byolken/support_pex branch December 1, 2016 23:18
@yolken yolken restored the byolken/support_pex branch December 2, 2016 00:44
@yolken yolken deleted the byolken/support_pex branch December 2, 2016 00:44
'-a', '--address', type=str, default='127.0.0.1',
help='Specify the address to which to bind the web server')
gunicorn_parser.add_argument(
'-p', '--port', type=int, default=8088,
Copy link
Contributor

@darabos darabos Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request has changed the default port number from config.get("SUPERSET_WEBSERVER_PORT") to 8088. Effectively the SUPERSET_WEBSERVER_ADDRESS, SUPERSET_WEBSERVER_PORT, SUPERSET_WORKERS, and SUPERSET_WEBSERVER_TIMEOUT config variables have been removed. (They have no effect now.)

Is this intentional? These settings are still mentioned in the docs.

+cc @szmate1618

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope it's not :) Care to open a PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope it's not :) Care to open a PR?

The solution is not easy it seems. The idea here appears to be to avoid importing superset before the Gunicorn application is loaded. So at the point where the flag defaults are set we cannot access the Superset config file yet.

I think it may be fine to retire these settings. Gunicorn can be configured via a config file. Superset could have a separate Gunicorn config file instead that would allow setting these web-server-specific parameters in one logical place. Plus it would give the user a chance to set many other Gunicorn settings that cannot be controlled now, such as the SSL certificate.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know, i kinda liked the one config file to rule them all approach. Let's ping the airbnb guys @yolken @mistercrunch

BTW could we please move the discussion to #1939 ?

@mistercrunch
Copy link
Member

We've found other ways of pex-ing for gunicorn to address this problem (gunicorn forking the process while sqlalchemy is not fork-safe), we'll revert most of this PR as soon as the alternative solution is up.

mistercrunch added a commit to mistercrunch/superset that referenced this pull request Jan 12, 2017
mistercrunch added a commit that referenced this pull request Jan 13, 2017
SalehHindi pushed a commit to SalehHindi/superset that referenced this pull request Jun 9, 2017
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.15.0 labels Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants