-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: issue #9130 substitute redundant columns when doing cross join #9154
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Lordworms for this contribution and I apologize for the delay in review
@jackwener any final thoughts on this PR? From my perspective it is ready to go.
I double checked with sqlite
and indeed it does appear to append :1
to such column names 🤯
sqlite> create table t1 (a int, b int);
sqlite> create table t2 (a int, b int);
sqlite> create table t3 (a int, b int);
sqlite> insert into t1 values (1, 2);
sqlite> insert into t2 values (3, 4);
sqlite> insert into t3 values (5, 6);
sqlite> select * from (t1 cross join t2) as t cross join t3;
1|2|3|4|5|6
sqlite> .header on
sqlite> select * from (t1 cross join t2) as t cross join t3;
a|b|a:1|b:1|a|b
1|2|3|4|5|6
sqlite>
cc @lewiszlw as you filed the issue
# under the License. | ||
|
||
|
||
# prepare the tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran this test without the code in this PR and it fails like this, so I think the test covers the changes in the PR
External error: query failed: DataFusion error: Error during planning: Projections require unique expression names but the expression "t.a" at position 0 and "t.a" at position 2 have the same name. Consider aliasing ("AS") one of them.
[SQL] select * from (t1 cross join t2) as t cross join t3;
-------
at test_files/./join.slt:733
at test_files/join_disable_repartition_joins.slt:26
Error: Execution("1 failures")
error: test failed, to rerun pass `-p datafusion-sqllogictest --test sqllogictests`
Caused by:
process didn't exit successfully: `/Users/andrewlamb/Software/arrow-datafusion/target/debug/deps/sqllogictests-6e501f6e871dacfc joins` (exit status: 1)
let counter = name_map.entry(field.name().to_string()).or_insert(0); | ||
*counter += 1; | ||
if *counter > 1 { | ||
let new_name = format!("{}:{}", field.name(), *counter - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This effectively renames some of the columns to something else (like a:1
) I think
How do we:
- know that name is unique? (couldn't there be another table in the query that has a column named
a:1
? - know we should resolve the namespace conflict with the first column with the name?
let t2_field_1 = DFField::new_unqualified("a", DataType::Int32, false); | ||
let t1_field_2 = DFField::new_unqualified("b", DataType::Int32, false); | ||
let t2_field_2 = DFField::new_unqualified("b", DataType::Int32, false); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to add one more field named "a" here (to show a counter other than :1
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I can do that
let schema: Schema = DFSchema::new_with_metadata(fields, meta_data) | ||
.unwrap() | ||
.into(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_with_metadata(fields, meta_data)?
to avoid .unwrap()
9e1f47b
to
3276fda
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @Lordworms
DFField::new_unqualified("a:1", DataType::Int32, false), | ||
DFField::new_unqualified("b", DataType::Int32, false), | ||
DFField::new_unqualified("b:1", DataType::Int32, false), | ||
DFField::new_unqualified("a:2", DataType::Int32, false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Which issue does this PR close?
Closes #9130
Rationale for this change
In cross-join, Since we have to validate the uniqueness of the name, we would throw an error when failed to do it. as what sqlite did, they just substituted the name of the redundant column(supposed to be a) into a:1, and we followed the strategy.
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?