Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix(presto, spark): remove WITHIN GROUP when transpiling percentile_[cont|disc] #1565

Merged
merged 4 commits into from
May 8, 2023

Conversation

georgesittas
Copy link
Collaborator

@georgesittas georgesittas commented May 8, 2023

Example usage:

>>> sqlglot.transpile("WITH sales(amount) AS (SELECT * FROM (VALUES (100), (200), (300), (400), (500), (600), (700), (800), (900), (1000)) sales(amount)) SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median FROM sales", "postgres", "trino")[0]
'WITH sales(amount) AS (SELECT * FROM (VALUES (100), (200), (300), (400), (500), (600), (700), (800), (900), (1000)) AS sales(amount)) SELECT APPROX_PERCENTILE(amount, 0.5) AS median FROM sales'

In Postgres:

georgesittas=# WITH sales(amount) AS (SELECT * FROM (VALUES (100), (200), (300), (400), (500), (600), (700), (800), (900), (1000)) sales(amount)) SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median FROM sales;
 median
--------
    550
(1 row)

In Trino:

trino> WITH sales(amount) AS (SELECT * FROM (VALUES (100), (200), (300), (400), (500), (600), (700), (800), (900), (1000)) AS sales(amount)) SELECT APPROX_PERCENTILE(amount, 0.5) AS median FROM sales;
 median 
--------
    600 
(1 row)

From what I can see, there's no exact percentile function in Presto / Trino, hence the different column value. Not sure if we can do something better than what's proposed in this PR.

cc: @vegarsti

References:

sqlglot/dialects/presto.py Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments for clarity. The changes in test_postgres were mostly stylistic (the only new thing added was a test for the transformation introduced here).

@@ -177,31 +177,34 @@ class Generator(Hive.Generator):
TRANSFORMS = {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added exp.WithinGroup transform, the rest were just sorted in alphabetical order.

and isinstance(expression.expression, exp.Order)
):
quantile = expression.this.this
input_value = t.cast(exp.Ordered, expression.find(exp.Ordered)).this
Copy link
Collaborator Author

@georgesittas georgesittas May 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cast was added to suppress a MyPy error. Since expression.expression is an exp.Order, it'll always contain an exp.Ordered in its expressions arg, so this should be safe to use.

Comment on lines +267 to +283
_sql_handler = getattr(self, expression.key + "_sql", None)
if _sql_handler:
return _sql_handler(expression)

transforms_handler = self.TRANSFORMS.get(type(expression))
if transforms_handler:
# Ensures we don't enter an infinite loop. This can happen when the original expression
# has the same type as the final expression and there's no _sql method available for it,
# because then it'd re-enter _to_sql.
if expression_type is type(expression):
raise ValueError(
f"Expression type {expression.__class__.__name__} requires a _sql method in order to be transformed."
)

return transforms_handler(self, expression)

raise ValueError(f"Unsupported expression type {expression.__class__.__name__}.")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation behind these changes was that we don't have a _sql method for exp.ApproxQuantile, so the transformation I added would fail at getattr. I think it makes _to_sql a bit more robust.

@georgesittas georgesittas changed the title Fix(presto): remove WITHIN GROUP when transpiling percentile_[cont|disc] Fix(presto, spark): remove WITHIN GROUP when transpiling percentile_[cont|disc] May 8, 2023
@tobymao tobymao merged commit 4744742 into main May 8, 2023
@tobymao tobymao deleted the jo/percentile_fixes branch May 8, 2023 20:08
adrianisk pushed a commit to adrianisk/sqlglot that referenced this pull request Jun 21, 2023
…cont|disc] (tobymao#1565)

* Fix(presto): remove WITHIN GROUP when transpiling percentile_[cont|disc]

* Cleanup

* Remove redundant test

* Include databricks in the added test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants