Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support running superset via pex #1713

Merged
merged 10 commits into from
Dec 1, 2016
57 changes: 55 additions & 2 deletions superset/bin/superset
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,60 @@ from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals

from superset.cli import manager
import argparse
import sys

import gunicorn.app.base


class GunicornSupersetApplication(gunicorn.app.base.BaseApplication):
Copy link
Member

@mistercrunch mistercrunch Dec 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cli logic should live in cli.py as we're trying to keep this file to a bare minimum since it can't be imported easily (no .py extension). You should also use flask_script the same way that the rest of the CLI is written as opposed to good old argparse

Copy link
Contributor Author

@yolken yolken Dec 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that but got lots of DB errors due to forking problems. :-( The issue is that, with the way things are set up now, you can't import superset or any of its subpackages before gunicorn forks (i.e., in the load function). The underlying cause is that sqlalchemy (initialized in superset/__init__.py) is not fork-safe.

We ran into similar issues when pex-ifying knowledge repo. The way around it is to refactor the code so that cli.py can be loaded without initializing the flask app and its associated DB connections in the gunicorn case. Looking at the code, this is doable, but would require a significant amount of refactoring.

Given these limitations, can we keep the gunicorn wrapper in the superset script for now and leave the refactoring as a future TODO? Apologies that it isn't as clean as it would be ideally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry I now realize I skipped reading the PR's message which explained all this. We'll need to update our chef recipe when merging this into production.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! The old version of runserver will still work until we deprecate it, so nothing should break immediately.

def __init__(self, address, port, workers, timeout):
self.options = {
'bind': '%s:%s' % (address, port),
'workers': workers,
'timeout': timeout,
'limit-request-line': 0,
'limit-request-field_size': 0
}

super(GunicornSupersetApplication, self).__init__()

def load_config(self):
config = dict([(key, value) for key, value in self.options.iteritems()
if key in self.cfg.settings and value is not None])
for key, value in config.iteritems():
self.cfg.set(key.lower(), value)

def load(self):
from superset import app

return app


def run_gunicorn():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to cli.py, use flask_script

parser = argparse.ArgumentParser(description='Run gunicorn for superset')
subparsers = parser.add_subparsers()
gunicorn_parser = subparsers.add_parser('runserver_gunicorn')

gunicorn_parser.add_argument('-a', '--address', type=str, default='0.0.0.0')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about making 127.0.0.1 as a default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

gunicorn_parser.add_argument('-p', '--port', type=int, default=8088)
gunicorn_parser.add_argument('-w', '--workers', type=int, default=4)
gunicorn_parser.add_argument('-t', '--timeout', type=int, default=30)

args = parser.parse_args()

gunicorn_app_obj = GunicornSupersetApplication(
args.address, args.port, args.workers, args.timeout)
gunicorn_app_obj.run()



if __name__ == "__main__":
manager.run()
if len(sys.argv) > 1 and sys.argv[1] == 'runserver_gunicorn':
# In the gunicorn case, don't go through manager so that superset import is deferred
# until GunicornSupersetApplication.load is called; this allows for the app to be cleanly
# forked without running the gunicorn command-line.
run_gunicorn()
else:
from superset.cli import manager
manager.run()
3 changes: 3 additions & 0 deletions superset/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@ def runserver(debug, address, port, timeout, workers):
threaded=True,
debug=True)
else:
print("DEPRECATED: Please run gunicorn via the 'runserver_gunicorn' command")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the two commands? What does change if we switch to the new way of loading gunicorn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing should change from the user perspective when switching to the new command. I'm keeping the old one in there for now for backwards compatibility. At least inside Airbnb, it's hard for us to simultaneously switch both the app code and the command used to run the app (set via chef).

Once this change is in, I can come back in a few days and delete the old gunicorn command.

Let me know if this doesn't sound ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codeclimate complains: print("DEPRECATED: Please run gunicorn via the 'runserver_gunicorn' command")


# TODO(byolken): Remove this and associated manager options.
cmd = (
"gunicorn "
"-w {workers} "
Expand Down
8 changes: 8 additions & 0 deletions superset/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from __future__ import print_function
from __future__ import unicode_literals

import imp
import json
import os

Expand Down Expand Up @@ -248,7 +249,14 @@ class CeleryConfig(object):
# by humans.
ROBOT_PERMISSION_ROLES = ['Public', 'Gamma', 'Alpha', 'Admin', 'sql_lab']

CONFIG_PATH_ENV_VAR = 'SUPERSET_CONFIG_PATH'

try:
if CONFIG_PATH_ENV_VAR in os.environ:
# Explicitly import config module that is not in pythonpath; useful for case where
# app is being executed via pex.
imp.load_source('superset_config', os.environ[CONFIG_PATH_ENV_VAR])

from superset_config import * # noqa
print('Loaded your LOCAL configuration')
except ImportError:
Expand Down