-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid LogicalPlan::clone()
in LogicalPlan::map_children
when possible
#9999
Conversation
1b98219
to
5fcbede
Compare
let new_children = self | ||
.inputs() | ||
.into_iter() | ||
.cloned() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this call to cloned()
(and the self.with_new_exprs
below) is the point of this PR.
As a subsequent PRs I plan to rewrite the Optimizer
and the various passes to use this method to rewrite the plans without copying them
where | ||
F: FnMut(Self) -> Result<Transformed<Self>>, | ||
{ | ||
Ok(match self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This follows the (really very cool) pattern @peter-toth came up with in #9913
5fcbede
to
36b28ad
Compare
36b28ad
to
56af5e1
Compare
} | ||
} | ||
|
||
/// Converts a `Arc<LogicalPlan>` without copying, if possible. Copies the plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the code that avoids copying for Arc<LogicalPlan>
when possible (my performance results show it is possible most of the time)
Also, you can see it with a local change like this:
diff --git a/datafusion/expr/src/logical_plan/tree_node.rs b/datafusion/expr/src/logical_plan/tree_node.rs
index 97e2f7f56..a8570c0af 100644
--- a/datafusion/expr/src/logical_plan/tree_node.rs
+++ b/datafusion/expr/src/logical_plan/tree_node.rs
@@ -338,10 +338,16 @@ impl TreeNode for LogicalPlan {
/// Converts a `Arc<LogicalPlan>` without copying, if possible. Copies the plan
/// if there is a shared reference
fn unwrap_arc(plan: Arc<LogicalPlan>) -> LogicalPlan {
- Arc::try_unwrap(plan)
- // if None is returned, there is another reference to this
- // LogicalPlan, so we can not own it, and must clone instead
- .unwrap_or_else(|node| node.as_ref().clone())
+ match Arc::try_unwrap(plan) {
+ Ok(plan) => {
+ println!("unwrapped!");
+ plan
+ }
+ Err(plan) => {
+ println!("BOO copying");
+ plan.as_ref().clone()
+ }
+ }
}
/// Applies `f` to rewrite a `Arc<LogicalPlan>` without copying, if possible
And then running
cargo test --test sqllogictests
...
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
There still is plenty of copying going on (957 copies), but there are 22,913 less copies!
andrewlamb@Andrews-MacBook-Pro:~/Software/arrow-datafusion$ cargo test --test sqllogictests | grep BOO | wc -l
Finished test [unoptimized + debuginfo] target(s) in 0.13s
Running bin/sqllogictests.rs (target/debug/deps/sqllogictests-518eef2279430877)
957
andrewlamb@Andrews-MacBook-Pro:~/Software/arrow-datafusion$ cargo test --test sqllogictests | grep unwrapped | wc -l
Finished test [unoptimized + debuginfo] target(s) in 0.13s
Running bin/sqllogictests.rs (target/debug/deps/sqllogictests-518eef2279430877)
22913
LogicalPlan::clone()
in LogicalPlan::map_children
when possibleLogicalPlan::clone()
in LogicalPlan::map_children
when possible
{ | ||
Ok(input_plans | ||
.into_iter() | ||
.map(unwrap_arc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder of we can drop this .map(unwrap_arc)
and just call .map_until_stop_and_collect(|plan| rewrite_arc(plan, f))?
because in that case we don't need the .update_data
and we don't need to unwrap the remaining Arc
s in case of TreeNodeRecursion::Stop
is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an excellent suggestion -- the code is both less verbose, and and it saves having to make a second Vec
I made this change in b29ebd2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the idea and the implementation @alamb and @jayzhan211!
Thanks @peter-toth -- I agree I am quite happy with this formulation compared to the initial versions. It is pretty slick and a great example of collaboration I think. I am now very excited to use it to stop the copying in the Optimizer -- I like refactoring as much as the next person, but it really helps to have an user visible goal (faster planning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we can avoid unwrap and wrap with Arc if the transform state is TreeNodeRecursion::Stop
Thanks @peter-toth and @jayzhan211 |
Which issue does this PR close?
This is the first step towards making DataFusion planning (much) faster -- #9637 (based on ideas from #9708 and #9768). 🙏 @jayzhan211
Rationale for this change
Profiling (see #9637) shows that slowest part of planning is copying
LogicalPlan
andExpr
in theOptimizer
Thus avoiding this copying as much as possible is key.
The TreeNode API is fast becoming the standard way to rewrite
LogicalPlan
in DataFusion so it is worth investing time making them faster (so everything implemented in terms of them gets faster). When I rewrote theOptimizer
to use this API (in #9948 -- it gets 10% faster)What changes are included in this PR?
LogicalPlan::map_children
to avoid cloning inputs when possible (by rewritingArc
as much as possible)Are these changes tested?
Functionally: By existing CI
Performance benchmarks: 1-2% faster in planning benchmarks (this only the beginning! Subsequent PRs are even better)
Details