-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Arc<LogicalPlan> from LogicalPlan, stop copying LogicalPlans #4628
Comments
I believe @mingmwang and @jackwener have noted this in the past -- the idea is good to me |
Hope to this change😍, it is very meaningful to optimizer. |
How would Currently we have: /// Try and rewrite `plan` to an optimized form, returning None if the plan cannot be
/// optimized by this rule.
fn try_optimize(
&self,
plan: &LogicalPlan,
config: &dyn OptimizerConfig,
) -> Result<Option<LogicalPlan>>; With the current |
Maybe we could change the signature to something like enum OptimizedPlan {
// Optimizer did not make any changes to the original input pla
NoChange(LogicalPlan),
/// Optimizer rewrote the original plan
Rewritten(LogicalPlan),
}
/// Try and rewrite `plan` to an optimized form
fn try_optimize(
&self,
plan: &LogicalPlan,
config: &dyn OptimizerConfig,
) -> Result<OptimizedPlan>; |
I think you would want to make try_optimize take ownership not a borrow, and then return it |
I think there are cases (like deciding when a fixed point is reached) where the caller wants to distinguish between no more optimization and a new plan. However, now that LogicalPlan supports So the signature maybe could be /// Try and rewrite `plan` to an optimized form
fn try_optimize(
&self,
plan: LogicalPlan,
config: &dyn OptimizerConfig,
) -> Result<LogicalPlan>; 🤔 |
To compare a new logical plan with the old one, we need to have both plans in the memory. Without |
This is not a particularly cheap operation, involving a lot of string comparisons. I think having the return value indicate if changes have been made as in your original example makes sense to me. My point was if we remove the Arc we need to be careful to move LogicalPlan and avoid cloning, as any clone is then a deep clone of the entire tree. We therefore need to pass in an owned value so that it can be moved into the return type |
At the first sight the idea seems to be compelling... Nevertheless I'm quite pessimistic about the idea. 😵💫 The main reason are optimisations, which UNDO other optimisations! (sorry for the long post! but it's core architecture stuff...) Examples of cancelling optimisationsExample 1. Commutative optimisations Some operations are commutative, like projection or limit - it is used in both So for some plans they will always give Example 2. Undoing inside of an optimisation Inside
We have many invocations of The point is that for a fixed point (merge of Theoretical painLet's look at the issue formally... 😎 Consider small changes The total change is:
So The premise of Just give Alternatives?Let's look again at the implication - how can we use it?
I would consider to use some stats about the tree - as a kind of heuristics. For example: number of nodes in the tree. Each optimization could return I guess there could be other useful stats, like |
To add a counter point to this change -- without sub-tree sharing, and in the presence of CTEs, LogicalPlan trees would be exponential in the size of the SQL query:
|
Is this something that is actually practicable? I would have thought the optimizer would simply ruin any effort to do this? |
I'm not familiar with the current optimizer implementation details, but this is a problem that manifests way before the optimizer comes into play -- if we take away sub-tree sharing in LogicalPlan, then the SQL compiler would be forced to generate exponential trees right from the start. Whereas in the current setup, (properly) generated LP trees would always be linear in the size of the input query, and if it blows up in some later optimizer stage, I assume it shouldn't be too hard to optimize the optimizer. |
Is this a problem? Is the memory usage of the plan representation a concern? This feels like a relatively niche optimisation for plans with repeated CTEs, that may perhaps be a touch premature? I would be very surprised if the optimizer won't blow this away when it rewrites the plans anyway. |
Yes this is a very real problem. We see this kind of pattern in production warehouse queries fairly often. They're usually the result of some automated query composition, and can get quite big by themselves. Tacking on an exponential factor on top means the system will be completely unusable (i.e. upwards of an hour just to compile one query, without even invoking the optimizer). It's not just about the memory footprint -- if your datastructure itself is exponential then that's basically your lower bound for performance, as a simple operation like All that is to say, if you plan to remove |
👍 I agree we should not regress any extant functionality in this space. That being said Arc is probably a poor way to go about sub-tree sharing, if it is used for this at all, as shared mutation is not possible. Some sort of mutable interner would likely be a better approach, and would facilitate optimising the given plan only once, as opposed to for every appearance |
For what it is worth, @mustafasrepo and I are working on something similar #8582 (in the physical plans now, any CTEs used more than once will be expanded out and the results not shared) |
Note about ScalarSubqueryToJoin
|
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Related to #4627, the current representation of
LogicalPlan
containsArc<LogicalPlan>
at various points, whilst this does reduce the cost of copying aLogicalPlan
tree, it:e.g. Vec<Arc<LogicalPlan>>
Describe the solution you'd like
I would like to remove the
Arc
, replacing withBox
where necessary. Methods that currently takeArc<LogicalPlan>
should be updated to takeLogicalPlan
.Describe alternatives you've considered
Additional context
This likely wants to wait until we are cloning
LogicalPlan
less frequentlyThe text was updated successfully, but these errors were encountered: