Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(dataset API): Add parameter to optionally render Jinja macros in API response #30721

Merged
merged 7 commits into from
Oct 30, 2024

Conversation

Vitor-Avila
Copy link
Contributor

@Vitor-Avila Vitor-Avila commented Oct 25, 2024

SUMMARY

This PR adds support for optionally rendering Jinja macros in the API response when getting a dataset. The motivation for this feature is that external tools might be fetching dataset metadata via the API to define lineage/etc, and in case Jinja is used it's not possible to fully parse the SQL query to identify dependencies/etc. An example query:

select * from {{dataset(1)}}
join cleaned_sales_data s on dataset_1.product_code = s.product_code

The query parameter include_rendered_sql (bool) can be used to add keys to the response:

  • rendered_sql for the rendered virtual dataset SQL query.
  • rendered_expression for the rendered expression from metrics and columns.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Before
image

After
image

TESTING INSTRUCTIONS

Testing added.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@github-actions github-actions bot added the api Related to the REST API label Oct 25, 2024
@dosubot dosubot bot added the global:jinja Related to Jinja templating label Oct 25, 2024
Comment on lines 1120 to 1145
item: Optional[SqlaTable] = self.datamodel.get(
pk,
self._base_filters,
self.show_select_columns,
self.show_outer_default_load,
)
if not item:
return self.response_404()

response: dict[str, Any] = {}
args = kwargs.get("rison", {})
select_cols = args.get(API_SELECT_COLUMNS_RIS_KEY, [])
pruned_select_cols = [col for col in select_cols if col in self.show_columns]
self.set_response_key_mappings(
response,
self.get,
args,
**{API_SELECT_COLUMNS_RIS_KEY: pruned_select_cols},
)
if pruned_select_cols:
show_model_schema = self.model2schemaconverter.convert(pruned_select_cols)
else:
show_model_schema = self.show_model_schema

response["id"] = pk
response[API_RESULT_RES_KEY] = show_model_schema.dump(item, many=False)
Copy link
Contributor Author

@Vitor-Avila Vitor-Avila Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is FAB's default implementation for get_headless - ref.

I decided to not simply do self.get_headless() and then modify the response for two reasons:

  • get_headless returns a response object, so while we could just modify the response.json it feels a little hacky.
  • Superset's get_headless implementation includes a call to send_stats_metrics (ref), which wouldn't include the full duration of the API call as the response would be further modified by this method before concluding.

I also thought about just overwriting FAB's pre_get method which is available in case the response needs to be modified as part of get_headless, but we need the Database associated with this dataset, and if the user chooses to not include the database.id in the response then pre_get wouldn't have this info.

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

superset/datasets/api.py Outdated Show resolved Hide resolved
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, however, it's worth remembering that the rendered query can change, considerably even, if it has lots of conditional Jinja logic. It's pretty common to see additional where clauses added if certain filter parameters are set in the chart data request. Just something to keep in mind.

@Vitor-Avila
Copy link
Contributor Author

LGTM, however, it's worth remembering that the rendered query can change, considerably even, if it has lots of conditional Jinja logic. It's pretty common to see additional where clauses added if certain filter parameters are set in the chart data request. Just something to keep in mind.

@villebro that's true. I actually decided to include both the regular sql and expression values and then add rendered_sql and rendered_expression if the param is present, so that users still get the original value. I think having both data is helpful for such cases.

except TemplateSyntaxError as ex:
raise SupersetTemplateException(
f"Unable to render expression from dataset {item_type}.",
) from ex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could possibly be simplified to:

items = [
    ("query", "sql", "rendered_sql", processor.process_template),
    ("metric", "metrics", "metrics", render_item_list),
    ("column", "columns", "columns", render_item_list),
]
for item_type, key, new_key, func in items:
    if key not in data:  # or if not data.get(key), potentially
        continue

    try:
        data[new_key] = func(data[key])
    except TemplateSyntaxError as ex:
        raise SupersetTemplateException(
            f"Unable to render expression from dataset {item_type}.",
        ) from ex

But it's perfectly fine as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's cool! Let me implement this 🙏

@Vitor-Avila Vitor-Avila merged commit e79778a into master Oct 30, 2024
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Related to the REST API global:jinja Related to Jinja templating size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants