Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support running superset via pex #1713

Merged
merged 10 commits into from
Dec 1, 2016
79 changes: 77 additions & 2 deletions superset/bin/superset
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,82 @@ from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals

from superset.cli import manager
import argparse
import sys

import gunicorn.app.base


class GunicornSupersetApplication(gunicorn.app.base.BaseApplication):
Copy link
Member

@mistercrunch mistercrunch Dec 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cli logic should live in cli.py as we're trying to keep this file to a bare minimum since it can't be imported easily (no .py extension). You should also use flask_script the same way that the rest of the CLI is written as opposed to good old argparse

Copy link
Contributor Author

@yolken yolken Dec 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that but got lots of DB errors due to forking problems. :-( The issue is that, with the way things are set up now, you can't import superset or any of its subpackages before gunicorn forks (i.e., in the load function). The underlying cause is that sqlalchemy (initialized in superset/__init__.py) is not fork-safe.

We ran into similar issues when pex-ifying knowledge repo. The way around it is to refactor the code so that cli.py can be loaded without initializing the flask app and its associated DB connections in the gunicorn case. Looking at the code, this is doable, but would require a significant amount of refactoring.

Given these limitations, can we keep the gunicorn wrapper in the superset script for now and leave the refactoring as a future TODO? Apologies that it isn't as clean as it would be ideally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry I now realize I skipped reading the PR's message which explained all this. We'll need to update our chef recipe when merging this into production.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! The old version of runserver will still work until we deprecate it, so nothing should break immediately.

def __init__(self, address, port, workers, timeout):
self.options = {
'bind': '%s:%s' % (address, port),
'workers': workers,
'timeout': timeout,
'limit-request-line': 0,
'limit-request-field_size': 0
}

super(GunicornSupersetApplication, self).__init__()

def load_config(self):
config = dict([(key, value) for key, value in self.options.iteritems()
if key in self.cfg.settings and value is not None])
for key, value in config.iteritems():
self.cfg.set(key.lower(), value)

def load(self):
from superset import app

return app


def run_server():
parser = argparse.ArgumentParser(description='Run gunicorn for superset')
subparsers = parser.add_subparsers()
gunicorn_parser = subparsers.add_parser('runserver')

gunicorn_parser.add_argument(
'-d', '--debug', action='store_true',
help='Start the web server in debug mode')
gunicorn_parser.add_argument(
'-a', '--address', type=str, default='127.0.0.1',
help='Specify the address to which to bind the web server')
gunicorn_parser.add_argument(
'-p', '--port', type=int, default=8088,
Copy link
Contributor

@darabos darabos Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request has changed the default port number from config.get("SUPERSET_WEBSERVER_PORT") to 8088. Effectively the SUPERSET_WEBSERVER_ADDRESS, SUPERSET_WEBSERVER_PORT, SUPERSET_WORKERS, and SUPERSET_WEBSERVER_TIMEOUT config variables have been removed. (They have no effect now.)

Is this intentional? These settings are still mentioned in the docs.

+cc @szmate1618

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope it's not :) Care to open a PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope it's not :) Care to open a PR?

The solution is not easy it seems. The idea here appears to be to avoid importing superset before the Gunicorn application is loaded. So at the point where the flag defaults are set we cannot access the Superset config file yet.

I think it may be fine to retire these settings. Gunicorn can be configured via a config file. Superset could have a separate Gunicorn config file instead that would allow setting these web-server-specific parameters in one logical place. Plus it would give the user a chance to set many other Gunicorn settings that cannot be controlled now, such as the SSL certificate.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know, i kinda liked the one config file to rule them all approach. Let's ping the airbnb guys @yolken @mistercrunch

BTW could we please move the discussion to #1939 ?

help='Specify the port on which to run the web server')
gunicorn_parser.add_argument(
'-w', '--workers', type=int, default=4,
help='Number of gunicorn web server workers to fire up')
gunicorn_parser.add_argument(
'-t', '--timeout', type=int, default=30,
help='Specify the timeout (seconds) for the gunicorn web server')

args = parser.parse_args()

if args.debug:
from superset import app

app.run(
host='0.0.0.0',
port=int(args.port),
threaded=True,
debug=True)
else:
gunicorn_app_obj = GunicornSupersetApplication(
args.address, args.port, args.workers, args.timeout)
gunicorn_app_obj.run()


if __name__ == "__main__":
manager.run()
if len(sys.argv) > 1 and sys.argv[1] == 'runserver':
# In the runserver case, don't go through the manager so that superset
# import is deferred until the app is loaded; this allows for the app to be run via pex
# and cleanly forked in the gunicorn case.
#
# TODO: Refactor cli so that gunicorn can be started without first importing superset;
# this will allow us to move the runserver logic back into cli module.
run_server()
else:
from superset.cli import manager
manager.run()
37 changes: 0 additions & 37 deletions superset/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,43 +21,6 @@
manager.add_command('db', MigrateCommand)


@manager.option(
'-d', '--debug', action='store_true',
help="Start the web server in debug mode")
@manager.option(
'-a', '--address', default=config.get("SUPERSET_WEBSERVER_ADDRESS"),
help="Specify the address to which to bind the web server")
@manager.option(
'-p', '--port', default=config.get("SUPERSET_WEBSERVER_PORT"),
help="Specify the port on which to run the web server")
@manager.option(
'-w', '--workers', default=config.get("SUPERSET_WORKERS", 2),
help="Number of gunicorn web server workers to fire up")
@manager.option(
'-t', '--timeout', default=config.get("SUPERSET_WEBSERVER_TIMEOUT"),
help="Specify the timeout (seconds) for the gunicorn web server")
def runserver(debug, address, port, timeout, workers):
"""Starts a Superset web server"""
debug = debug or config.get("DEBUG")
if debug:
app.run(
host='0.0.0.0',
port=int(port),
threaded=True,
debug=True)
else:
cmd = (
"gunicorn "
"-w {workers} "
"--timeout {timeout} "
"-b {address}:{port} "
"--limit-request-line 0 "
"--limit-request-field_size 0 "
"superset:app").format(**locals())
print("Starting server with command: " + cmd)
Popen(cmd, shell=True).wait()


@manager.command
def init():
"""Inits the Superset application"""
Expand Down
8 changes: 8 additions & 0 deletions superset/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from __future__ import print_function
from __future__ import unicode_literals

import imp
import json
import os

Expand Down Expand Up @@ -248,7 +249,14 @@ class CeleryConfig(object):
# by humans.
ROBOT_PERMISSION_ROLES = ['Public', 'Gamma', 'Alpha', 'Admin', 'sql_lab']

CONFIG_PATH_ENV_VAR = 'SUPERSET_CONFIG_PATH'

try:
if CONFIG_PATH_ENV_VAR in os.environ:
# Explicitly import config module that is not in pythonpath; useful
# for case where app is being executed via pex.
imp.load_source('superset_config', os.environ[CONFIG_PATH_ENV_VAR])

from superset_config import * # noqa
print('Loaded your LOCAL configuration')
except ImportError:
Expand Down
6 changes: 6 additions & 0 deletions tests/utils_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from superset import utils
import unittest

from mock import Mock, patch

class UtilsTestCase(unittest.TestCase):
def test_json_int_dttm_ser(self):
Expand All @@ -16,3 +17,8 @@ def test_json_int_dttm_ser(self):

with self.assertRaises(TypeError):
utils.json_int_dttm_ser("this is not a date")

@patch('superset.utils.datetime')
def test_parse_human_timedelta(self, mock_now):
mock_now.return_value = datetime(2016, 12, 1)
self.assertEquals(utils.parse_human_timedelta('now'), timedelta(0))