-
Notifications
You must be signed in to change notification settings - Fork 769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix(presto, spark): remove WITHIN GROUP when transpiling percentile_[cont|disc] #1565
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments for clarity. The changes in test_postgres
were mostly stylistic (the only new thing added was a test for the transformation introduced here).
@@ -177,31 +177,34 @@ class Generator(Hive.Generator): | |||
TRANSFORMS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added exp.WithinGroup
transform, the rest were just sorted in alphabetical order.
and isinstance(expression.expression, exp.Order) | ||
): | ||
quantile = expression.this.this | ||
input_value = t.cast(exp.Ordered, expression.find(exp.Ordered)).this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cast was added to suppress a MyPy error. Since expression.expression
is an exp.Order
, it'll always contain an exp.Ordered
in its expressions
arg, so this should be safe to use.
_sql_handler = getattr(self, expression.key + "_sql", None) | ||
if _sql_handler: | ||
return _sql_handler(expression) | ||
|
||
transforms_handler = self.TRANSFORMS.get(type(expression)) | ||
if transforms_handler: | ||
# Ensures we don't enter an infinite loop. This can happen when the original expression | ||
# has the same type as the final expression and there's no _sql method available for it, | ||
# because then it'd re-enter _to_sql. | ||
if expression_type is type(expression): | ||
raise ValueError( | ||
f"Expression type {expression.__class__.__name__} requires a _sql method in order to be transformed." | ||
) | ||
|
||
return transforms_handler(self, expression) | ||
|
||
raise ValueError(f"Unsupported expression type {expression.__class__.__name__}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The motivation behind these changes was that we don't have a _sql
method for exp.ApproxQuantile
, so the transformation I added would fail at getattr
. I think it makes _to_sql
a bit more robust.
…cont|disc] (tobymao#1565) * Fix(presto): remove WITHIN GROUP when transpiling percentile_[cont|disc] * Cleanup * Remove redundant test * Include databricks in the added test
Example usage:
In Postgres:
In Trino:
From what I can see, there's no exact percentile function in Presto / Trino, hence the different column value. Not sure if we can do something better than what's proposed in this PR.
cc: @vegarsti
References: