-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursive unnest
#11062
Recursive unnest
#11062
Conversation
datafusion/sql/src/utils.rs
Outdated
let transformed_exprs = transform(&expr, arg)?; | ||
Ok(Transformed::yes(transformed_exprs[0].clone())) | ||
// root_expr.push(transformed_exprs[0].clone()); | ||
Ok(Transformed::new(transformed_exprs[0].clone(),true,TreeNodeRecursion::Stop)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use swap_remove(0)
to avoid clone
input: &LogicalPlan, | ||
unnest_placeholder_columns: &mut Vec<String>, | ||
inner_projection_exprs: &mut Vec<Expr>, | ||
original_expr: Expr, | ||
original_expr: &Expr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could avoid clone in L297 too.
pub(crate) fn recursive_transform_unnest( | ||
/// If along the path from root to bottom, there are multiple unnest expressions, the transformation | ||
/// is done only for the bottom expression | ||
pub(crate) fn transform_bottom_unnest( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would transform List / Struct separately help readability here?
something like
transform_bottom_unnest() {
// common logic
fn transform();
list:
transform_list // list specific logic
struct:
transform_struct // struct specific logic (only unnest with the top level)
}
I plan to review this tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @duongcongtoai -- I think this PR is quite close. In my opinion it only needs a few more tests and we should be able to merge it.
Thanks @jayzhan211 for the review
datafusion/sql/src/select.rs
Outdated
@@ -291,51 +290,68 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { | |||
} | |||
|
|||
/// Try converting Expr(Unnest(Expr)) to Projection/Unnest/Projection | |||
/// FIXME: if within select_exprs, one unnest expr is referenced multiple times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we file a ticket to track this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this FIXME to sqllogictest, ticket created: #11198
@@ -33,8 +33,8 @@ AS VALUES | |||
statement ok | |||
CREATE TABLE nested_unnest_table | |||
AS VALUES | |||
(struct('a', 'b', struct('c')), (struct('a', 'b', [10,20])), [struct('a', 'b')]), | |||
(struct('d', 'e', struct('f')), (struct('x', 'y', [30,40, 50])), null) | |||
(struct(['a'], 'b', struct('c')), (struct('a', 'b', [[10,20]])), [struct(['a'], 'b')]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be possible to please leave the original test the same (so we can verify the existing behavior is unchanged) and add new tests for the newly added features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i revert the change on this table and created a new one
// unnest struct is only possible here | ||
// The transformation looks like | ||
// - unnest(struct_col) will be transformed into unnest(struct_col).field1, unnest(struct_col).field2 | ||
if let Expr::Unnest(Unnest { expr: ref arg }) = transformed_expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than having to clone the expr to special case the root, could you instead check the root initially?
Something like
if let Expr::Unnest(Unnest { expr }) = original_expr {
let expr = expr.transform_up(...);
Ok(transform(transformex_expe)
}
And similarly for the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that approach, but it will add more logic to the code, something like:
if let Expr::Unnest(Unnest { expr: ref arg }) = original_expr {
let Transformed {
data: transformed_expr,
transformed,
tnr: _,
} = arg.clone().transform_up(|expr: Expr| {
...
})?;
// no transformation eligible in the children
// only then the root unnest can be transformed
if !transformed {
return transform(&transformed_expr, arg);
} else {
return Ok(vec![transformed_expr]);
}
}
// from this onward, unnest(struct) is invalid because
// we are sure that root_expr is not unnest expr
let Transformed {
data: transformed_expr,
transformed,
tnr: _,
} = original_expr.clone().transform_up(|expr: Expr| {
...
if !transformed {
if matches!(&transformed_expr, Expr::Column(_)) {
inner_projection_exprs.push(transformed_expr.clone());
Ok(vec![transformed_expr])
} else {
// We need to evaluate the expr in the inner projection,
// outer projection just select its name
let column_name = transformed_expr.display_name()?;
inner_projection_exprs.push(transformed_expr);
Ok(vec![Expr::Column(Column::from_name(column_name))])
}
} else {
Ok(vec![transformed_expr])
}
})?;
For the clone() part, i don't know how to avoid it because transform_up will take the ownership of the expr
Please let me know your opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you have is good -- thank you for trying this
|
||
## double unnest list | ||
query I? | ||
select unnest(unnest(column2['c2'])), column2 from nested_unnest_table; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you also please add tests for:
- Multiple unnest:
select unnest(col1), unnest(col1)
,select unnest(col1), unnest(col12)
- Unnest + expression:
select unnest(col1)+1
- Unnest on a non structured column (e.g. unnest('foo'))
- Triple unnest
unnest(unnest(unnnest(..)))
(it is ok if they error, but I think it would be useful to have the additional test coverage that demonstrates how sophisticated the implementation is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added more tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @duongcongtoai
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @duongcongtoai and @alamb
Which issue does this PR close?
Closes #10660.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?