Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move SortExec partition check to constructor #179

Closed
alamb opened this issue Apr 26, 2021 · 1 comment
Closed

Move SortExec partition check to constructor #179

alamb opened this issue Apr 26, 2021 · 1 comment
Labels
datafusion Changes in the datafusion crate

Comments

@alamb
Copy link
Contributor

alamb commented Apr 26, 2021

Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-11625

SortExec has the following error check at execution time and this could be moved into the try_new constructor so the error check happens at planning time instead.

 
{code:java}
if 1 != self.input.output_partitioning().partition_count() {
return Err(DataFusionError::Internal(
"SortExec requires a single input partition".to_owned(),
));
} {code}

@alamb alamb added the datafusion Changes in the datafusion crate label Apr 26, 2021
@alamb
Copy link
Contributor Author

alamb commented Apr 26, 2021

Comment from Hendrik Makait(hendrik.makait) @ 2021-02-14T17:46:12.067+0000:

I'd love to check that out tomorrow.

Comment from Hendrik Makait(hendrik.makait) @ 2021-02-15T20:05:17.782+0000:

Moving this check into the constructor leads to failing tests. As far as I can see, this is because the planner does an optimization step that inserts a merge for children that contain multiple partitions.
{code:java}
match plan.required_child_distribution() {
    Distribution::UnspecifiedDistribution => plan.with_new_children(children),
    Distribution::SinglePartition => plan.with_new_children(
        children
            .iter()
            .map(|child| {
                if child.output_partitioning().partition_count() == 1 {
                    child.clone()
                } else {
                    Arc::new(MergeExec::new(child.clone()))
                }
            })
            .collect(),
    ),
}
{code}
 What's the reason for moving this check into planning time? How should I proceed?

Comment from Andy Grove(andygrove) @ 2021-02-16T14:44:11.773+0000:

I see. Ok, maybe we can't do this at planning time then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

No branches or pull requests

2 participants