ISSUES-468 support query stage level shuffle data #469

zhang2014 · 2021-05-04T09:10:02Z

Summary

Support query stage level shuffle data [Initial stage]

TODO

RFC document
Experimental implementation

Changelog

New Feature

Related Issues

fixes #468
fixes #614
related #440

# Conflicts: # common/exception/src/exception.rs

BohuTANG · 2021-05-13T05:36:42Z

common/datavalues/src/data_array_scatter_test.rs

+use common_arrow::arrow::alloc::NativeType;
+
+#[test]
+fn test_scatter_primitive_data() -> Result<()> {


Great work!

sundy-li · 2021-05-13T06:43:05Z

common/datablocks/src/data_block_kernel.rs

+            let mut block_columns = vec![];
+            for scattered_column in &scattered_columns[begin_index..end_index] {
+                match scattered_column {
+                    None => panic!(""),


Why not return Err ?

codecov-commenter · 2021-05-15T04:17:52Z

Codecov Report

Merging #469 (a4eff0f) into master (198b6c2) will decrease coverage by 1%.
The diff coverage is 75%.

@@           Coverage Diff            @@
##           master    #469     +/-   ##
========================================
- Coverage      80%     78%     -2%     
========================================
  Files         289     302     +13     
  Lines       14255   16297   +2042     
========================================
+ Hits        11406   12806   +1400     
- Misses       2849    3491    +642

Impacted Files	Coverage Δ
common/datavalues/src/data_array_hash.rs	`0% <0%> (ø)`
common/functions/src/function_factory.rs	`93% <ø> (ø)`
common/functions/src/hashes/siphash.rs	`0% <0%> (ø)`
common/planners/src/plan_builder.rs	`84% <ø> (+<1%)`	⬆️
common/planners/src/plan_explain.rs	`0% <0%> (ø)`
common/planners/src/plan_insert_into.rs	`0% <0%> (ø)`
common/planners/src/plan_rewriter.rs	`46% <0%> (-5%)`	⬇️
common/planners/src/plan_rewriter_test.rs	`84% <ø> (-1%)`	⬇️
common/planners/src/plan_use_database.rs	`0% <0%> (ø)`
fusequery/query/src/api/rpc/flight_client_new.rs	`0% <0%> (ø)`
... and 130 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 198b6c2...a4eff0f. Read the comment docs.

BohuTANG · 2021-05-30T13:39:17Z

common/datablocks/src/data_block_kernel.rs

+            }
+        }
+
+        let mut scattered_blocks = vec![];


better to use vec::with_capacity?

BohuTANG · 2021-05-30T13:40:30Z

tests/fuse-test

                    stdout_file = os.path.join(suite_tmp_dir, name) + file_suffix + '.stdout'
                    stderr_file = os.path.join(suite_tmp_dir, name) + file_suffix + '.stderr'

+                    if args.mode == 'cluster' and os.path.isfile(cluster_result_file):


BohuTANG · 2021-05-30T13:41:45Z

scripts/deploy/fusequery-cluster-3-nodes.sh

-curl http://127.0.0.1:8081/v1/cluster/add -X POST -H "Content-Type: application/json" -d '{"name":"cluster1","address":"0.0.0.0:9091", "priority":3, "cpus":8}'
-curl http://127.0.0.1:8081/v1/cluster/add -X POST -H "Content-Type: application/json" -d '{"name":"cluster2","address":"0.0.0.0:9092", "priority":3, "cpus":8}'
-curl http://127.0.0.1:8081/v1/cluster/add -X POST -H "Content-Type: application/json" -d '{"name":"cluster3","address":"0.0.0.0:9093", "priority":1, "cpus":8}'
+curl http://127.0.0.1:8081/v1/cluster/add -X POST -H "Content-Type: application/json" -d '{"name":"cluster1","address":"127.0.0.1:9091", "priority":3, "cpus":8}'


By the way, the cpus is not used anymore.

sundy-li · 2021-05-30T14:08:45Z

common/datablocks/src/data_block_kernel.rs

+        scattered_columns.resize_with(scatter_size * columns_size, || None);
+
+        for column_index in 0..columns_size {
+            let column = block.column(column_index).to_array()?;


For constant column, we can have a faster path

Let's optimize it in other patches #658

jyizheng · 2021-05-30T17:23:18Z

common/planners/src/plan_node.rs

+            PlanNode::Explain(v) => v.set_input(inputs[0]),
+            PlanNode::Select(v) => v.set_input(inputs[0]),
+            PlanNode::Sort(v) => v.set_input(inputs[0]),
+            _ => {}


Maybe better to return an error for the other types of PlanNode; otherwise, the error can be propagated to somewhere else.

jyizheng · 2021-05-30T17:38:35Z

website/datafuse/docs/rfcs/query/2021-05-27-data-shuffle.md

+A query plan (or query execution plan) is a sequence of steps used to access data in DataFuse. It is built by PlanBuilder from AST. We also use tree to describe it(similar to AST). But it has some differences with AST:
+
+- Plan is serializable and deserializable.
+- Plan is grammatically safe, we don't worry about it.


Is this because the plan has passed the parser?

jyizheng · 2021-05-30T17:51:11Z

website/datafuse/docs/rfcs/query/2021-05-27-data-shuffle.md

+- In distributed mode, the tables to be queried are always distributed in different nodes
+- For some scenarios, distributed processing is always efficient, such as GROUP BY with keys, JOIN
+- For some scenarios, we have no way of distributed processing, such as LIMIT, GROUP BY without keys
+- In order to ensure fast calculation, we need to coordinate the position of calculation and data.


Does position mean location?

jyizheng · 2021-05-30T18:30:21Z

website/datafuse/docs/rfcs/query/2021-05-27-data-shuffle.md

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| explain                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| Projection: argMin(user, salary):UInt64   <-- execute in local node


Which sql statement is this explantion for?

EXPLAIN SELECT argMin(user, salary) FROM (SELECT sum(number) AS salary, number%3 AS user FROM numbers_local(1000000000) GROUP BY user);

numbers_local is a local table, numbers_local (1000000000) it's very large (more than 500MB or 100 million rows), ScatterOptimizer will think it's more appropriate to distribute it.

jyizheng · 2021-05-30T19:26:59Z

common/planners/src/plan_stage.rs

-    GroupByMerge,
-    AggregatorMerge
+    Expansive,
+    Convergent


How about the plan that preserves the cardinality?

What is preserves the cardinality?

jyizheng · 2021-05-30T19:49:32Z

fusequery/query/src/api/rpc/flight_client_new.rs

+        Ok(())
+    }
+
+    // Execute do_action.


BohuTANG · 2021-05-31T02:07:11Z

fusequery/query/src/optimizers/optimizer_scatters.rs

+
+        match plan.group_expr.len() {
+            0 => {
+                // For the final state, we need to aggregate the data


This comments made some confusion, does it mean:
If no group by we convergent it in local node?

BohuTANG · 2021-05-31T02:07:43Z

fusequery/query/src/optimizers/optimizer_scatters.rs

+                    kind: StageKind::Normal,
+                    scatters_expr: Expression::ScalarFunction {
+                        op: String::from("sipHash"),
+                        args: vec![Expression::Column(String::from("_group_by_key"))]


Cool, the reshuffle key is here 💯

BohuTANG · 2021-05-31T02:26:35Z

fusequery/query/src/api/rpc/flight_dispatcher.rs

+use crate::sessions::SessionManagerRef;
+
+#[derive(Debug)]
+pub struct PrepareStageInfo(


Prefer to use a nomal struct for better code reading instead the tuple struct.
In particular, the prepare_stage method have some inconvenient to read

BohuTANG · 2021-05-31T03:01:41Z

fusequery/query/src/api/rpc/flight_scatter.rs

+        num: usize
+    ) -> Result<FlightScatter> {
+        let indices_expression_action = Expression::ScalarFunction {
+            op: String::from("modulo"),


Does it only scatted by hash?
If there are some others scatter in future, it would be better to rename FlightScatter to FlightScatterByHash?

sundy-li · 2021-05-31T03:31:16Z

fusequery/query/src/sql/plan_parser.rs

                scan.and_then(|scan| match scan {
                    PlanNode::Scan(ref scan) => table
-                        .read_plan(self.ctx.clone(), scan)
+                        .read_plan(self.ctx.clone(), scan, self.ctx.get_max_threads()? as usize)


We already got ctx, why need to pass max_thread in ?

In standalone mode, the maximum concurrency is max_threads.
For cluster mode, the maximum concurrent number should be max_threads * nodes_size.
We will re plan the plan in the plan_scheduler.

let table = ctx.get_table(&plan.db, &plan.table)?; if !table.is_local() { let new_partitions_size = ctx.get_max_threads()? as usize * cluster_nodes.len(); let new_read_source_plan = table.read_plan(ctx.clone(), &*plan.scan_plan, new_partitions_size)?; // We always put adjacent partitions in the same node let new_partitions = &new_read_source_plan.partitions; let mut nodes_partitions = HashMap::new(); let partitions_pre_node = new_partitions.len() / cluster_nodes.len();

reference: https://github.com/datafuselabs/datafuse/pull/469/files#diff-d9bf8b037f42cfc24f1b4e8138e966f80aae1b68deb6fe82ceb0d2ad7a93a806R236

databend-bot · 2021-05-31T07:16:49Z

CI Passed
Reviewer Approved
Let's Merge

databendlabs#469) * add RouteHintGenerator * add in_active_transaction * reset route hint * chore: rename to route hint * handle the route hint from query response

github-actions bot added the A-query Area: databend query label May 4, 2021

ISSUES-468 implement do_get flight data stream

29d28c5

zhang2014 force-pushed the support/shuffle_stage branch from cda5446 to 29d28c5 Compare May 4, 2021 09:26

ISSUES-468 add flight dispatcher

d0ecc83

zhang2014 force-pushed the support/shuffle_stage branch from 8f78940 to d0ecc83 Compare May 6, 2021 13:59

zhang2014 added 4 commits May 6, 2021 22:47

ISSUES-468 add launch for query stage

a196cc3

ISSUES-468 try fix build failure

2136680

ISSUES-468 clean code

8f55871

ISSUES-468 push do_get logic to service

9f9c08c

github-actions bot added the needs-rebase label May 9, 2021

zhang2014 added 2 commits May 10, 2021 22:07

ISSUES-468 implement get_schema & list actions

dbe05de

Merge branch 'master' into support/shuffle_stage

738417a

# Conflicts: # common/exception/src/exception.rs

github-actions bot removed the needs-rebase label May 10, 2021

zhang2014 added 9 commits May 11, 2021 00:40

ISSUES-468 implement get_flight_info & do_action

7787cce

ISSUES-468 try fix unit test failure

26056dd

ISSUES-468 add list actions unit test

3eae3f4

ISSUES-468 clean code for ErrorCodes

bddb1b1

ISSUES-468 add some unit test for service

3116f13

ISSUES-468 add split for primitive data type

a7a0560

ISSUES-468 add scatter for DataBlock

42f005b

ISSUES-468 add unit test for primitive array type

687e4d8

ISSUES-468 add unit test for DataBlock scatter

e7053b4

BohuTANG reviewed May 13, 2021

View reviewed changes

sundy-li reviewed May 13, 2021

View reviewed changes

zhang2014 added 4 commits May 13, 2021 18:01

ISSUES-468 support scatter for string & binary

786b9f9

Merge branch 'master' into support/shuffle_stage

c88acc3

Merge branch 'master' into support/shuffle_stage

b27de27

ISSUES-468 add prepare query stage with no scatter

b01c0d7

ISSUES-468 support scatter prepare query stage

3b6f02a

zhang2014 requested review from BohuTANG and sundy-li May 30, 2021 12:31

BohuTANG reviewed May 30, 2021

View reviewed changes

common/datablocks/src/data_block_kernel.rs Outdated

}

}

let mut scattered_blocks = vec![];

Copy link

Member

BohuTANG May 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to use vec::with_capacity?

BohuTANG reviewed May 30, 2021

View reviewed changes

sundy-li reviewed May 30, 2021

View reviewed changes

jyizheng reviewed May 30, 2021

View reviewed changes

fusequery/query/src/api/rpc/flight_client_new.rs Outdated

Ok(())

}

// Execute do_action.

Copy link

Contributor

jyizheng May 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do_get?

zhang2014 mentioned this pull request May 31, 2021

Optimize constant column for scatter_block #658

Closed

zhang2014 added 2 commits May 31, 2021 09:12

ISSUES-468 apply reviewers suggestions

ff1178c

ISSUES-468 try fix test failure

1b200da

BohuTANG reviewed May 31, 2021

View reviewed changes

sundy-li reviewed May 31, 2021

View reviewed changes

ISSUES-468 apply reviewers suggestions

a4eff0f

zhang2014 requested review from BohuTANG, jyizheng and sundy-li May 31, 2021 06:54

BohuTANG approved these changes May 31, 2021

View reviewed changes

databend-bot merged commit 1d5d6e9 into databendlabs:master May 31, 2021

BohuTANG mentioned this pull request Jun 7, 2021

[roadmap-track] Query cluster #747

Closed

2 tasks

BohuTANG added this to the v0.5 milestone Jun 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISSUES-468 support query stage level shuffle data #469

ISSUES-468 support query stage level shuffle data #469

zhang2014 commented May 4, 2021 •

edited

Loading

BohuTANG May 13, 2021

sundy-li May 13, 2021

codecov-commenter commented May 15, 2021 •

edited

Loading

BohuTANG May 30, 2021

BohuTANG May 30, 2021

BohuTANG May 30, 2021

sundy-li May 30, 2021

zhang2014 May 31, 2021

jyizheng May 30, 2021

jyizheng May 30, 2021

zhang2014 May 31, 2021

jyizheng May 30, 2021

jyizheng May 30, 2021

zhang2014 May 31, 2021

jyizheng May 30, 2021

zhang2014 May 31, 2021

jyizheng May 30, 2021

BohuTANG May 31, 2021

BohuTANG May 31, 2021 •

edited

Loading

BohuTANG May 31, 2021 •

edited

Loading

BohuTANG May 31, 2021 •

edited

Loading

zhang2014 May 31, 2021

sundy-li May 31, 2021

zhang2014 May 31, 2021

databend-bot commented May 31, 2021

ISSUES-468 support query stage level shuffle data #469

ISSUES-468 support query stage level shuffle data #469

Conversation

zhang2014 commented May 4, 2021 • edited Loading

Summary

TODO

Changelog

Related Issues

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented May 15, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BohuTANG May 31, 2021 • edited Loading

Choose a reason for hiding this comment

BohuTANG May 31, 2021 • edited Loading

Choose a reason for hiding this comment

BohuTANG May 31, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

databend-bot commented May 31, 2021

zhang2014 commented May 4, 2021 •

edited

Loading

codecov-commenter commented May 15, 2021 •

edited

Loading

BohuTANG May 31, 2021 •

edited

Loading

BohuTANG May 31, 2021 •

edited

Loading

BohuTANG May 31, 2021 •

edited

Loading