Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enable ETag header for dashboard GET requests #10963

Merged
merged 2 commits into from
Sep 29, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions superset/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -297,6 +297,7 @@ def _try_json_readsha( # pylint: disable=unused-argument
# Experimental feature introducing a client (browser) cache
"CLIENT_CACHE": False,
"ENABLE_EXPLORE_JSON_CSRF_PROTECTION": False,
"ENABLE_DASHBOARD_ETAG_HEADER": False,
"KV_STORE": False,
"PRESTO_EXPAND_DATA": False,
# Exposes API endpoint to compute thumbnails
Expand Down
36 changes: 31 additions & 5 deletions superset/utils/decorators.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,15 @@
# specific language governing permissions and limitations
# under the License.
import logging
from datetime import datetime, timedelta
from datetime import datetime, timedelta, timezone
from functools import wraps
from typing import Any, Callable, Iterator
from typing import Any, Callable, Iterator, Optional

from contextlib2 import contextmanager
from flask import request
from werkzeug.wrappers.etag import ETagResponseMixin

from superset import app, cache
from superset import app, cache, is_feature_enabled
from superset.stats_logger import BaseStatsLogger
from superset.utils.dates import now_as_float

Expand All @@ -34,6 +34,10 @@
logger = logging.getLogger(__name__)


def is_dashboard_request(kwargs: Any) -> bool:
return kwargs.get("dashboard_id_or_slug") is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't seem robust, it it possible to validate via uri path or just pass a param ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after introduce this skip, dashboard_id_or_slug is not needed in decorator function any more. this check is removed.



@contextmanager
def stats_timing(stats_key: str, stats_logger: BaseStatsLogger) -> Iterator[float]:
"""Provide a transactional scope around a series of operations."""
Expand All @@ -46,7 +50,11 @@ def stats_timing(stats_key: str, stats_logger: BaseStatsLogger) -> Iterator[floa
stats_logger.timing(stats_key, now_as_float() - start_ts)


def etag_cache(max_age: int, check_perms: Callable[..., Any]) -> Callable[..., Any]:
def etag_cache(
max_age: int,
check_perms: Callable[..., Any],
check_latest_changed_on: Optional[Callable[..., Any]] = None,
) -> Callable[..., Any]:
"""
A decorator for caching views and handling etag conditional requests.

Expand All @@ -72,6 +80,12 @@ def wrapper(*args: Any, **kwargs: Any) -> ETagResponseMixin:
if request.method == "POST":
return f(*args, **kwargs)

# if it is dashboard request but feature is not eabled,
# do not use cache
is_dashboard = is_dashboard_request(kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a helper function, maybe we can just expand dashboard_id_or_slug in wrapper and make this code more transparent?

def wrapper(*args: Any, dashboard_id_or_slug: str=None, **kwargs: Any) -> ETagResponseMixin:
  # ...
  is_dashboard = dashboard_id_or_slug is not None

Better yet, we should probably aim for keeping all dashboard specific logics out of etag_cache so it stays generic. Maybe add a skip= parameter that runs the feature flag check.

@etag_cache(skip=lambda: is_feature_enabled("ENABLE_DASHBOARD_ETAG_HEADER"))
def etag_cache(
    max_age: int,
    check_perms: Callable[..., Any],
    skip: Optional[Callable[..., Any]] = None,
) -> Callable[..., Any]:

  def decorator(f: Callable[..., Any]) -> Callable[..., Any]:

    def wrapper(*args: Any, **kwargs: Any) -> ETagResponseMixin:

      check_perms(*args, **kwargs)

      if request.method == "POST" or (skip and skip(*args, **kwargs)):
        return f(*args, **kwargs)

Copy link
Author

@graceguo-supercat graceguo-supercat Sep 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! after introduce this skip, dashboard_id is not needed in decorator function any more.

if is_dashboard and not is_feature_enabled("ENABLE_DASHBOARD_ETAG_HEADER"):
return f(*args, **kwargs)

response = None
if cache:
try:
Expand All @@ -89,13 +103,25 @@ def wrapper(*args: Any, **kwargs: Any) -> ETagResponseMixin:
raise
logger.exception("Exception possibly due to cache backend.")

# if cache is stale?
if check_latest_changed_on:
latest_changed_on = check_latest_changed_on(*args, **kwargs)
if response and response.last_modified:
latest_record = response.last_modified.replace(
tzinfo=timezone.utc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assumes that superset server runs in utc zone, it may be safer to make it as a superset config variable

Copy link
Author

@graceguo-supercat graceguo-supercat Sep 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this convert, .replace(tzinfo) is not necessary, removed.

).astimezone(tz=None)
if latest_changed_on.timestamp() > latest_record.timestamp():
response = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably rename check_latest_changed_on to get_latest_changed_on and do

if get_latest_changed_on:
  latest_changed_on = get_latest_changed_on(*args, **kwargs)
  if response and response.last_modified and response.last_modified < latest_changed_on:
    response = None
else:
  latest_changed_on = datetime.utcnow()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.


# if no response was cached, compute it using the wrapped function
if response is None:
response = f(*args, **kwargs)

# add headers for caching: Last Modified, Expires and ETag
response.cache_control.public = True
response.last_modified = datetime.utcnow()
response.last_modified = (
latest_changed_on if is_dashboard else datetime.utcnow()
)
expiration = max_age if max_age != 0 else FAR_FUTURE
response.expires = response.last_modified + timedelta(
seconds=expiration
Expand Down
77 changes: 34 additions & 43 deletions superset/views/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@
# pylint: disable=comparison-with-callable
import logging
import re
from collections import defaultdict
from contextlib import closing
from datetime import datetime
from typing import Any, cast, Dict, List, Optional, Union
Expand All @@ -26,7 +25,17 @@
import backoff
import pandas as pd
import simplejson as json
from flask import abort, flash, g, Markup, redirect, render_template, request, Response
from flask import (
abort,
flash,
g,
make_response,
Markup,
redirect,
render_template,
request,
Response,
)
from flask_appbuilder import expose
from flask_appbuilder.models.sqla.interface import SQLAInterface
from flask_appbuilder.security.decorators import has_access, has_access_api
Expand Down Expand Up @@ -114,11 +123,15 @@
_deserialize_results_payload,
apply_display_max_row_limit,
bootstrap_user_data,
check_dashboard_perms,
check_datasource_perms,
check_slice_perms,
get_cta_schema_name,
get_dashboard,
get_dashboard_extra_filters,
get_dashboard_latest_changed_on,
get_datasource_info,
get_datasources_from_dashboard,
get_form_data,
get_viz,
is_owner,
Expand Down Expand Up @@ -1585,42 +1598,18 @@ def publish( # pylint: disable=no-self-use
return json_success(json.dumps({"published": dash.published}))

@has_access
@etag_cache(
0,
check_perms=check_dashboard_perms,
check_latest_changed_on=get_dashboard_latest_changed_on,
)
@expose("/dashboard/<dashboard_id_or_slug>/")
def dashboard( # pylint: disable=too-many-locals
self, dashboard_id_or_slug: str
) -> FlaskResponse:
"""Server side rendering for a dashboard"""
session = db.session()
qry = session.query(Dashboard)
if dashboard_id_or_slug.isdigit():
qry = qry.filter_by(id=int(dashboard_id_or_slug))
else:
qry = qry.filter_by(slug=dashboard_id_or_slug)

dash = qry.one_or_none()
if not dash:
abort(404)

datasources = defaultdict(list)
for slc in dash.slices:
datasource = slc.datasource
if datasource:
datasources[datasource].append(slc)

if config["ENABLE_ACCESS_REQUEST"]:
for datasource in datasources:
if datasource and not security_manager.can_access_datasource(
datasource
):
flash(
__(
security_manager.get_datasource_access_error_msg(datasource)
),
"danger",
)
return redirect(
"superset/request_access/?" f"dashboard_id={dash.id}&"
)
dash = get_dashboard(dashboard_id_or_slug)
datasources = get_datasources_from_dashboard(dash)

# Filter out unneeded fields from the datasource payload
datasources_payload = {
Expand Down Expand Up @@ -1661,7 +1650,7 @@ def dashboard(**_: Any) -> None:
if is_feature_enabled("REMOVE_SLICE_LEVEL_LABEL_COLORS"):
# dashboard metadata has dashboard-level label_colors,
# so remove slice-level label_colors from its form_data
for slc in dashboard_data.get("slices"):
for slc in dashboard_data.get("slices") or []:
form_data = slc.get("form_data")
form_data.pop("label_colors", None)

Expand Down Expand Up @@ -1695,15 +1684,17 @@ def dashboard(**_: Any) -> None:
json.dumps(bootstrap_data, default=utils.pessimistic_json_iso_dttm_ser)
)

return self.render_template(
"superset/dashboard.html",
entry="dashboard",
standalone_mode=standalone_mode,
title=dash.dashboard_title,
custom_css=dashboard_data.get("css"),
bootstrap_data=json.dumps(
bootstrap_data, default=utils.pessimistic_json_iso_dttm_ser
),
return make_response(
self.render_template(
"superset/dashboard.html",
entry="dashboard",
standalone_mode=standalone_mode,
title=dash.dashboard_title,
custom_css=dashboard_data.get("css"),
bootstrap_data=json.dumps(
bootstrap_data, default=utils.pessimistic_json_iso_dttm_ser
),
)
)

@api
Expand Down
74 changes: 71 additions & 3 deletions superset/views/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,19 +16,27 @@
# under the License.
import logging
from collections import defaultdict
from datetime import date
from datetime import date, datetime
from typing import Any, Callable, DefaultDict, Dict, List, Optional, Set, Tuple, Union
from urllib import parse

import msgpack
import pyarrow as pa
import simplejson as json
from flask import g, request
from flask import abort, flash, g, redirect, request
from flask_appbuilder.security.sqla import models as ab_models
from flask_appbuilder.security.sqla.models import User
from flask_babel import gettext as __

import superset.models.core as models
from superset import app, dataframe, db, is_feature_enabled, result_set
from superset import (
app,
dataframe,
db,
is_feature_enabled,
result_set,
security_manager,
)
from superset.connectors.connector_registry import ConnectorRegistry
from superset.errors import ErrorLevel, SupersetError, SupersetErrorType
from superset.exceptions import SupersetException, SupersetSecurityException
Expand Down Expand Up @@ -298,6 +306,46 @@ def get_time_range_endpoints(
CONTAINER_TYPES = ["COLUMN", "GRID", "TABS", "TAB", "ROW"]


def get_dashboard(dashboard_id_or_slug: str,) -> Dashboard:
session = db.session()
qry = session.query(Dashboard)
if dashboard_id_or_slug.isdigit():
qry = qry.filter_by(id=int(dashboard_id_or_slug))
else:
qry = qry.filter_by(slug=dashboard_id_or_slug)
dashboard = qry.one_or_none()

if not dashboard:
abort(404)

return dashboard


def get_datasources_from_dashboard(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a good candidate for the Dashboard class method

Copy link
Author

@graceguo-supercat graceguo-supercat Sep 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a little confusing: Dashboard class has a get datasources function. So i use it in check_dashboard_perms function. But this function is to group slices by datasources, and the result will be used by another feature:
https://github.com/apache/incubator-superset/blob/ba009b7c09d49f2932fd10269882c901bc020c1d/superset/views/core.py#L1626
Instead of datasource.data, datasource.data_for_slices(slices) can reduce the initial dashboard data load size.

So right now i removed this helper function from utils, and build dict in the dashboard function. But i rename datasource to slices_by_datasources for clarification.

dashboard: Dashboard,
) -> DefaultDict[Any, List[Any]]:
datasources = defaultdict(list)
for slc in dashboard.slices:
datasource = slc.datasource
if datasource:
datasources[datasource].append(slc)
return datasources


def get_dashboard_latest_changed_on(_self: Any, dashboard_id_or_slug: str) -> datetime:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is _self here? ideally we should avoid Any types

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see other functions that used by decorator:
This function takes `self` since it must have the same signature as the the decorated method.

"""
Get latest changed datetime for a dashboard. The change could be dashboard
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Get latest changed datetime for a dashboard.
/Get latest changed datetime for a dashboard and it's charts

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. I rename it to get_dashboard_latest_changedon_dt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get more specific with _self type ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what type to use here. do you have suggestion? (Sorry i am not an expert in Python)

metadata change, or any of its slice data change.

This function takes `self` since it must have the same signature as the
the decorated method.
"""
dash = get_dashboard(dashboard_id_or_slug)
dash_changed_on = dash.changed_on
slices_changed_on = max([s.changed_on for s in dash.slices])
return max(dash_changed_on, slices_changed_on)


def get_dashboard_extra_filters(
slice_id: int, dashboard_id: int
) -> List[Dict[str, Any]]:
Expand Down Expand Up @@ -490,6 +538,26 @@ def check_slice_perms(_self: Any, slice_id: int) -> None:
viz_obj.raise_for_access()


def check_dashboard_perms(_self: Any, dashboard_id_or_slug: str) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

best practice is to have a unit test for every function, it would be great if you could add some

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i agree. but this function is refactored out from dashboard function. it is tested in
https://github.com/apache/incubator-superset/blob/448a41a4e7563cafadea1e03feb5980151e8b56d/tests/security_tests.py#L665
I assume the old unit tests didn't break will be good enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a good practice is to incrementally improve the state of the code, however it will be your call here

"""
Check if user can access a cached response from explore_json.

This function takes `self` since it must have the same signature as the
the decorated method.
"""

dash = get_dashboard(dashboard_id_or_slug)
datasources = get_datasources_from_dashboard(dash)
if app.config["ENABLE_ACCESS_REQUEST"]:
for datasource in datasources:
if datasource and not security_manager.can_access_datasource(datasource):
flash(
__(security_manager.get_datasource_access_error_msg(datasource)),
"danger",
)
redirect("superset/request_access/?" f"dashboard_id={dash.id}&")


def _deserialize_results_payload(
payload: Union[bytes, str], query: Query, use_msgpack: Optional[bool] = False
) -> Dict[str, Any]:
Expand Down