Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid LogicalPlan::clone() in LogicalPlan::map_children when possible #9999

Merged
merged 4 commits into from
Apr 9, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Apr 8, 2024

Which issue does this PR close?

This is the first step towards making DataFusion planning (much) faster -- #9637 (based on ideas from #9708 and #9768). 🙏 @jayzhan211

Rationale for this change

Profiling (see #9637) shows that slowest part of planning is copying LogicalPlan and Expr in the Optimizer

Thus avoiding this copying as much as possible is key.

The TreeNode API is fast becoming the standard way to rewrite LogicalPlan in DataFusion so it is worth investing time making them faster (so everything implemented in terms of them gets faster). When I rewrote the Optimizer to use this API (in #9948 -- it gets 10% faster)

What changes are included in this PR?

  1. Update LogicalPlan::map_children to avoid cloning inputs when possible (by rewriting Arc as much as possible)

Are these changes tested?

Functionally: By existing CI

Performance benchmarks: 1-2% faster in planning benchmarks (this only the beginning! Subsequent PRs are even better)

Details

++ critcmp main map_in_place2
group                                         main                                   map_in_place2
-----                                         ----                                   -------------
logical_aggregate_with_join                   1.00  1182.7±63.77µs        ? ?/sec    1.00  1183.5±12.43µs        ? ?/sec
logical_plan_tpcds_all                        1.00    154.4±0.63ms        ? ?/sec    1.00    154.4±0.65ms        ? ?/sec
logical_plan_tpch_all                         1.00     16.6±0.15ms        ? ?/sec    1.00     16.6±0.14ms        ? ?/sec
logical_select_all_from_1000                  1.00     19.2±0.13ms        ? ?/sec    1.00     19.3±0.09ms        ? ?/sec
logical_select_one_from_700                   1.00    774.1±7.20µs        ? ?/sec    1.00    774.9±6.98µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00    723.6±6.95µs        ? ?/sec    1.01   730.1±12.65µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00    711.2±9.34µs        ? ?/sec    1.01   718.5±26.52µs        ? ?/sec
physical_plan_tpcds_all                       1.02   1831.7±3.37ms        ? ?/sec    1.00   1799.9±9.30ms        ? ?/sec
physical_plan_tpch_all                        1.01    119.1±0.61ms        ? ?/sec    1.00    117.5±0.56ms        ? ?/sec
physical_plan_tpch_q1                         1.02      7.3±0.10ms        ? ?/sec    1.00      7.1±0.05ms        ? ?/sec
physical_plan_tpch_q10                        1.02      5.5±0.04ms        ? ?/sec    1.00      5.4±0.02ms        ? ?/sec
physical_plan_tpch_q11                        1.02      4.8±0.03ms        ? ?/sec    1.00      4.7±0.02ms        ? ?/sec
physical_plan_tpch_q12                        1.02      3.9±0.04ms        ? ?/sec    1.00      3.8±0.02ms        ? ?/sec
physical_plan_tpch_q13                        1.01      2.6±0.01ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
physical_plan_tpch_q14                        1.02      3.3±0.01ms        ? ?/sec    1.00      3.3±0.01ms        ? ?/sec
physical_plan_tpch_q16                        1.01      4.8±0.02ms        ? ?/sec    1.00      4.8±0.02ms        ? ?/sec
physical_plan_tpch_q17                        1.01      4.6±0.02ms        ? ?/sec    1.00      4.6±0.02ms        ? ?/sec
physical_plan_tpch_q18                        1.02      5.0±0.04ms        ? ?/sec    1.00      4.9±0.02ms        ? ?/sec
physical_plan_tpch_q19                        1.00      9.4±0.09ms        ? ?/sec    1.00      9.3±0.06ms        ? ?/sec
physical_plan_tpch_q2                         1.01     10.5±0.08ms        ? ?/sec    1.00     10.4±0.07ms        ? ?/sec
physical_plan_tpch_q20                        1.02      6.1±0.04ms        ? ?/sec    1.00      6.0±0.03ms        ? ?/sec
physical_plan_tpch_q21                        1.02      8.3±0.04ms        ? ?/sec    1.00      8.1±0.06ms        ? ?/sec
physical_plan_tpch_q22                        1.03      4.4±0.01ms        ? ?/sec    1.00      4.3±0.02ms        ? ?/sec
physical_plan_tpch_q3                         1.01      3.9±0.01ms        ? ?/sec    1.00      3.8±0.02ms        ? ?/sec
physical_plan_tpch_q4                         1.03      2.9±0.03ms        ? ?/sec    1.00      2.8±0.02ms        ? ?/sec
physical_plan_tpch_q5                         1.01      5.6±0.02ms        ? ?/sec    1.00      5.5±0.04ms        ? ?/sec
physical_plan_tpch_q6                         1.02  1975.5±13.85µs        ? ?/sec    1.00  1944.6±11.61µs        ? ?/sec
physical_plan_tpch_q7                         1.01      7.5±0.03ms        ? ?/sec    1.00      7.4±0.03ms        ? ?/sec
physical_plan_tpch_q8                         1.01      9.5±0.06ms        ? ?/sec    1.00      9.4±0.04ms        ? ?/sec
physical_plan_tpch_q9                         1.01      7.2±0.04ms        ? ?/sec    1.00      7.1±0.03ms        ? ?/sec
physical_select_all_from_1000                 1.00    128.0±0.45ms        ? ?/sec    1.00    128.0±0.45ms        ? ?/sec
physical_select_one_from_700                  1.00      4.0±0.02ms        ? ?/sec    1.00      4.0±0.02ms        ? ?/sec```

</p>
</details> 

## Are there any user-facing changes?
(slightly) faster Optimizer performance 



## Notes

Huge thanks to @peter-toth  for sorting out and driving the `TreeNode` API in the first place, most recently https://github.com/apache/arrow-datafusion/pull/9913

Note this is different than previous designs (e.g. https://github.com/apache/arrow-datafusion/pull/9946) where a default `Arc` was left in place. Thanks to the comments from @peter-toth  and @jayzhan211  I think this API is now much cleaner than earlier versions. 

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Apr 8, 2024
@alamb alamb force-pushed the alamb/map_in_place2 branch from 1b98219 to 5fcbede Compare April 8, 2024 14:20
let new_children = self
.inputs()
.into_iter()
.cloned()
Copy link
Contributor Author

@alamb alamb Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this call to cloned() (and the self.with_new_exprs below) is the point of this PR.

As a subsequent PRs I plan to rewrite the Optimizer and the various passes to use this method to rewrite the plans without copying them

where
F: FnMut(Self) -> Result<Transformed<Self>>,
{
Ok(match self {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This follows the (really very cool) pattern @peter-toth came up with in #9913

@alamb alamb force-pushed the alamb/map_in_place2 branch from 5fcbede to 36b28ad Compare April 8, 2024 14:25
@alamb alamb marked this pull request as draft April 8, 2024 14:25
@alamb alamb force-pushed the alamb/map_in_place2 branch from 36b28ad to 56af5e1 Compare April 8, 2024 14:27
}
}

/// Converts a `Arc<LogicalPlan>` without copying, if possible. Copies the plan
Copy link
Contributor Author

@alamb alamb Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the code that avoids copying for Arc<LogicalPlan> when possible (my performance results show it is possible most of the time)

Also, you can see it with a local change like this:

diff --git a/datafusion/expr/src/logical_plan/tree_node.rs b/datafusion/expr/src/logical_plan/tree_node.rs
index 97e2f7f56..a8570c0af 100644
--- a/datafusion/expr/src/logical_plan/tree_node.rs
+++ b/datafusion/expr/src/logical_plan/tree_node.rs
@@ -338,10 +338,16 @@ impl TreeNode for LogicalPlan {
 /// Converts a `Arc<LogicalPlan>` without copying, if possible. Copies the plan
 /// if there is a shared reference
 fn unwrap_arc(plan: Arc<LogicalPlan>) -> LogicalPlan {
-    Arc::try_unwrap(plan)
-        // if None is returned, there is another reference to this
-        // LogicalPlan, so we can not own it, and must clone instead
-        .unwrap_or_else(|node| node.as_ref().clone())
+    match Arc::try_unwrap(plan) {
+        Ok(plan) => {
+            println!("unwrapped!");
+            plan
+        }
+        Err(plan) => {
+            println!("BOO copying");
+            plan.as_ref().clone()
+        }
+    }
 }

 /// Applies `f` to rewrite a `Arc<LogicalPlan>` without copying, if possible

And then running

cargo test --test sqllogictests
...
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!
unwrapped!

There still is plenty of copying going on (957 copies), but there are 22,913 less copies!

andrewlamb@Andrews-MacBook-Pro:~/Software/arrow-datafusion$ cargo test --test sqllogictests  | grep BOO | wc -l
    Finished test [unoptimized + debuginfo] target(s) in 0.13s
     Running bin/sqllogictests.rs (target/debug/deps/sqllogictests-518eef2279430877)
     957
andrewlamb@Andrews-MacBook-Pro:~/Software/arrow-datafusion$ cargo test --test sqllogictests  | grep unwrapped | wc -l
    Finished test [unoptimized + debuginfo] target(s) in 0.13s
     Running bin/sqllogictests.rs (target/debug/deps/sqllogictests-518eef2279430877)
   22913

@alamb alamb changed the title Refactor: Avoid LogicalPlan::clone() in LogicalPlan::map_children when possible Avoid LogicalPlan::clone() in LogicalPlan::map_children when possible Apr 8, 2024
@alamb alamb marked this pull request as ready for review April 8, 2024 15:41
{
Ok(input_plans
.into_iter()
.map(unwrap_arc)
Copy link
Contributor

@peter-toth peter-toth Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder of we can drop this .map(unwrap_arc) and just call .map_until_stop_and_collect(|plan| rewrite_arc(plan, f))? because in that case we don't need the .update_data and we don't need to unwrap the remaining Arcs in case of TreeNodeRecursion::Stop is returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent suggestion -- the code is both less verbose, and and it saves having to make a second Vec

I made this change in b29ebd2

Copy link
Contributor

@peter-toth peter-toth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the idea and the implementation @alamb and @jayzhan211!

@alamb
Copy link
Contributor Author

alamb commented Apr 8, 2024

LGTM, thanks for the idea and the implementation @alamb and @jayzhan211!

Thanks @peter-toth -- I agree I am quite happy with this formulation compared to the initial versions. It is pretty slick and a great example of collaboration I think.

I am now very excited to use it to stop the copying in the Optimizer -- I like refactoring as much as the next person, but it really helps to have an user visible goal (faster planning)

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we can avoid unwrap and wrap with Arc if the transform state is TreeNodeRecursion::Stop

@alamb alamb merged commit cb21404 into apache:main Apr 9, 2024
24 checks passed
@alamb
Copy link
Contributor Author

alamb commented Apr 9, 2024

Thanks @peter-toth and @jayzhan211

@alamb alamb deleted the alamb/map_in_place2 branch April 9, 2024 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants