-
Notifications
You must be signed in to change notification settings - Fork 763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISSUES-468 support query stage level shuffle data #469
ISSUES-468 support query stage level shuffle data #469
Conversation
cda5446
to
29d28c5
Compare
8f78940
to
d0ecc83
Compare
# Conflicts: # common/exception/src/exception.rs
use common_arrow::arrow::alloc::NativeType; | ||
|
||
#[test] | ||
fn test_scatter_primitive_data() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
let mut block_columns = vec![]; | ||
for scattered_column in &scattered_columns[begin_index..end_index] { | ||
match scattered_column { | ||
None => panic!(""), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return Err ?
Codecov Report
@@ Coverage Diff @@
## master #469 +/- ##
========================================
- Coverage 80% 78% -2%
========================================
Files 289 302 +13
Lines 14255 16297 +2042
========================================
+ Hits 11406 12806 +1400
- Misses 2849 3491 +642
Continue to review full report at Codecov.
|
} | ||
} | ||
|
||
let mut scattered_blocks = vec![]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to use vec::with_capacity?
stdout_file = os.path.join(suite_tmp_dir, name) + file_suffix + '.stdout' | ||
stderr_file = os.path.join(suite_tmp_dir, name) + file_suffix + '.stderr' | ||
|
||
if args.mode == 'cluster' and os.path.isfile(cluster_result_file): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
curl http://127.0.0.1:8081/v1/cluster/add -X POST -H "Content-Type: application/json" -d '{"name":"cluster1","address":"0.0.0.0:9091", "priority":3, "cpus":8}' | ||
curl http://127.0.0.1:8081/v1/cluster/add -X POST -H "Content-Type: application/json" -d '{"name":"cluster2","address":"0.0.0.0:9092", "priority":3, "cpus":8}' | ||
curl http://127.0.0.1:8081/v1/cluster/add -X POST -H "Content-Type: application/json" -d '{"name":"cluster3","address":"0.0.0.0:9093", "priority":1, "cpus":8}' | ||
curl http://127.0.0.1:8081/v1/cluster/add -X POST -H "Content-Type: application/json" -d '{"name":"cluster1","address":"127.0.0.1:9091", "priority":3, "cpus":8}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, the cpus
is not used anymore.
scattered_columns.resize_with(scatter_size * columns_size, || None); | ||
|
||
for column_index in 0..columns_size { | ||
let column = block.column(column_index).to_array()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For constant
column, we can have a faster path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's optimize it in other patches #658
common/planners/src/plan_node.rs
Outdated
PlanNode::Explain(v) => v.set_input(inputs[0]), | ||
PlanNode::Select(v) => v.set_input(inputs[0]), | ||
PlanNode::Sort(v) => v.set_input(inputs[0]), | ||
_ => {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to return an error for the other types of PlanNode; otherwise, the error can be propagated to somewhere else.
A query plan (or query execution plan) is a sequence of steps used to access data in DataFuse. It is built by PlanBuilder from AST. We also use tree to describe it(similar to AST). But it has some differences with AST: | ||
|
||
- Plan is serializable and deserializable. | ||
- Plan is grammatically safe, we don't worry about it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this because the plan has passed the parser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
- In distributed mode, the tables to be queried are always distributed in different nodes | ||
- For some scenarios, distributed processing is always efficient, such as GROUP BY with keys, JOIN | ||
- For some scenarios, we have no way of distributed processing, such as LIMIT, GROUP BY without keys | ||
- In order to ensure fast calculation, we need to coordinate the position of calculation and data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does position
mean location
?
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ||
| explain | | ||
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ||
| Projection: argMin(user, salary):UInt64 <-- execute in local node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which sql statement is this explantion for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EXPLAIN SELECT argMin(user, salary) FROM (SELECT sum(number) AS salary, number%3 AS user FROM numbers_local(1000000000) GROUP BY user);
numbers_local is a local table, numbers_local (1000000000) it's very large (more than 500MB or 100 million rows), ScatterOptimizer will think it's more appropriate to distribute it.
GroupByMerge, | ||
AggregatorMerge | ||
Expansive, | ||
Convergent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the plan that preserves the cardinality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is preserves the cardinality
?
Ok(()) | ||
} | ||
|
||
// Execute do_action. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do_get?
|
||
match plan.group_expr.len() { | ||
0 => { | ||
// For the final state, we need to aggregate the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comments made some confusion, does it mean:
If no group by
we convergent it in local node?
kind: StageKind::Normal, | ||
scatters_expr: Expression::ScalarFunction { | ||
op: String::from("sipHash"), | ||
args: vec![Expression::Column(String::from("_group_by_key"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, the reshuffle key is here 💯
use crate::sessions::SessionManagerRef; | ||
|
||
#[derive(Debug)] | ||
pub struct PrepareStageInfo( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer to use a nomal struct for better code reading instead the tuple struct.
In particular, the prepare_stage
method have some inconvenient to read
num: usize | ||
) -> Result<FlightScatter> { | ||
let indices_expression_action = Expression::ScalarFunction { | ||
op: String::from("modulo"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it only scatted by hash?
If there are some others scatter in future, it would be better to rename FlightScatter
to FlightScatterByHash
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
scan.and_then(|scan| match scan { | ||
PlanNode::Scan(ref scan) => table | ||
.read_plan(self.ctx.clone(), scan) | ||
.read_plan(self.ctx.clone(), scan, self.ctx.get_max_threads()? as usize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already got ctx, why need to pass max_thread
in ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In standalone mode, the maximum concurrency is max_threads
.
For cluster mode, the maximum concurrent number should be max_threads * nodes_size
.
We will re plan the plan in the plan_scheduler.
let table = ctx.get_table(&plan.db, &plan.table)?;
if !table.is_local() {
let new_partitions_size = ctx.get_max_threads()? as usize * cluster_nodes.len();
let new_read_source_plan = table.read_plan(ctx.clone(), &*plan.scan_plan, new_partitions_size)?;
// We always put adjacent partitions in the same node
let new_partitions = &new_read_source_plan.partitions;
let mut nodes_partitions = HashMap::new();
let partitions_pre_node = new_partitions.len() / cluster_nodes.len();
CI Passed |
databendlabs#469) * add RouteHintGenerator * add in_active_transaction * reset route hint * chore: rename to route hint * handle the route hint from query response
Summary
Support query stage level shuffle data [Initial stage]
TODO
Changelog
Related Issues
fixes #468
fixes #614
related #440