-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend & generalize constant folding / evaluation in logical optimizer #237
Comments
This would be a neat feature. |
@alamb just picking your brain here - do you think this should be part of the logical optimizations or physical optimizations? A way that could work within the current setup for
This makes it a bit less useful (still useful nonetheless), as some other optimizations might benefit from constant folding I am wondering here in general, whether we can/should unify |
fwiw, when a logical optimization is applied, the expressions are re-written and the "column name" is consequently re-written. Thus, what was named To apply it on the logical level, we may need to wrap the expression by an I agree that the sooner in the optimization these are applied, the higher the likelihood of synergies between optimizers. |
@jorgecarleitao that's a good one - I did also see something in the same order recently when looking at this #268 problem. |
It is a problem already in the current constant folding! I am opening an issue for this.
|
I would imagine this to be done on
I think the LogicalPlan / PhysicalPlan distinction makes sense (b/c logically a Join is just a Join -- but physically maybe we would be using a CROSS JOIN w/ filter, or an Hash Inner Join, or a Merge Join, etc) I am not as sure about the distinction between If we could directly evaluate |
FYI while I was reviewing the code in https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_plan/parquet.rs in the context of #363 I noticed there is already a way to do "partial evaluation" for expressions -- maybe we could fake the same to evaluate |
Possibly related to #1070 |
I think we have implemented most of the suggestions in this issue -- I am not sure if it is tracking anything actionable anymore |
Yes I agree, this is done 🚀. Closing this issue |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The (logical) optimizer contains some support for folding (boolean) constants. This can help, especially with other optimization passes, to optimize queries. For example,
LIMIT (0 + 0)
could be optimized first toLIMIT 0
, to enable eliminating the whole plan.We should try to extend this support to most datatypes & expressions.
Describe the solution you'd like
Expr
s can already be evaluated against aRecordBatch
, and there is code to evaluate scalar values without going through Arrow. To make sure that the constant evaluation is implemented correctly & the same as the evaluation code, we should be able to reuse the code from there.Describe alternatives you've considered
Manually implement the constant folding support. Downside here is that we end up with two implementations, which has a higher maintenance burden.
Additional context
Not in scope: add it to physical optimizer too. Here it could help too, especially if we have support for partitions.
The text was updated successfully, but these errors were encountered: