Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support columns having the same alias #6543

Open
berkaysynnada opened this issue Jun 5, 2023 · 15 comments · May be fixed by #13489
Open

Support columns having the same alias #6543

berkaysynnada opened this issue Jun 5, 2023 · 15 comments · May be fixed by #13489
Labels
bug Something isn't working

Comments

@berkaysynnada
Copy link
Contributor

berkaysynnada commented Jun 5, 2023

Describe the bug

When we give the same aliases for multiple columns (SELECT ts as c1, inc_col as c1 FROM annotated_data_infinite), builder gives such an error:
Plan("Projections require unique expression names but the expression \"annotated_data_infinite.ts AS c1\" at position 0 and \"annotated_data_infinite.inc_col AS c1\" at position 1 have the same name. Consider aliasing (\"AS\") one of them.")

To Reproduce

        ctx.sql(
            "CREATE EXTERNAL TABLE annotated_data_infinite (
              ts INTEGER,
              inc_col INTEGER,
              desc_col INTEGER,
            )
            STORED AS CSV
            WITH HEADER ROW
            WITH ORDER (ts ASC)
            LOCATION '/Users/berkaysahin/Desktop/arrow-datafusion/datafusion/core/tests/data/window_1.csv'",
        )
        .await?;
        let sql = "SELECT ts as c1, inc_col as c1 FROM annotated_data_infinite";
        let dataframe = ctx.sql(sql).await.expect(&msg);

Expected behavior

Postgre can handle it and gives result with two columns having the same name. I don't know this is an intentional behaviour in Datafusion or a bug, but I would like to open an issue.

Additional context

No response

@berkaysynnada berkaysynnada added the bug Something isn't working label Jun 5, 2023
@comphead
Copy link
Contributor

comphead commented Jun 6, 2023

Thanks @berkaysynnada for raising. This is quite old problem,DF has unique column name check in the planner. We planned to move this check one level upper, so the query will fail if outer query references the inner query containing duplicated aliases. Fir you scenario is it real world one?

@berkaysynnada
Copy link
Contributor Author

Thanks @berkaysynnada for raising. This is quite old problem,DF has unique column name check in the planner. We planned to move this check one level upper, so the query will fail if outer query references the inner query containing duplicated aliases. Fir you scenario is it real world one?

Not actually, it was a hypothetical trial. Your plan makes sense and thanks for letting me know about it.

@alamb
Copy link
Contributor

alamb commented Jun 24, 2023

I also filed #6758 to think about the problem with large column names.

@alamb
Copy link
Contributor

alamb commented Jun 24, 2023

I wonder if some potential solution for this issue would be to automatically add a string to make the columns unique in the arrow schema?

@comphead
Copy link
Contributor

that is good idea btw, currently we got bunch of issues

  • Arrow schema uniqueness violation
  • DFSchema uniqueness violation
    this solution can potentially address both issues

@jackwener
Copy link
Member

This is a legacy issue.

Generally, we won't raise an error for having columns with the same name unless an outer subquery references that column name.

In terms of this issue itself, we should fix it in the planner.

@alamb
Copy link
Contributor

alamb commented Jun 26, 2023

🤯 I didn't realize this worked

(arrow_dev) alamb@MacBook-Pro-8:~/Software/influxdb_iox2$ datafusion-cli
DataFusion CLI v26.0.0
❯ create table foo (x int) as values (1), (2), (3);
0 rows in set. Query took 0.003 seconds.
❯ select x as "my_col", x as "my col" from foo;
+--------+--------+
| my_col | my col |
+--------+--------+
| 1      | 1      |
| 2      | 2      |
| 3      | 3      |
+--------+--------+
3 rows in set. Query took 0.005 seconds.
❯ select x as "my_col", x+1 as "my col" from foo;
+--------+--------+
| my_col | my col |
+--------+--------+
| 1      | 2      |
| 2      | 3      |
| 3      | 4      |

However, using c1 as the alias for some reason fails:

select x as c1, x as c1 from foo;
Error during planning: Projections require unique expression names but the expression "foo.x AS c1" at position 0 and "foo.x AS c1" at position 1 have the same name. Consider aliasing ("AS") one of them.+--------+--------+

@comphead
Copy link
Contributor

🤯 I didn't realize this worked

(arrow_dev) alamb@MacBook-Pro-8:~/Software/influxdb_iox2$ datafusion-cli
DataFusion CLI v26.0.0
❯ create table foo (x int) as values (1), (2), (3);
0 rows in set. Query took 0.003 seconds.
❯ select x as "my_col", x as "my col" from foo;
+--------+--------+
| my_col | my col |
+--------+--------+
| 1      | 1      |
| 2      | 2      |
| 3      | 3      |
+--------+--------+
3 rows in set. Query took 0.005 seconds.
❯ select x as "my_col", x+1 as "my col" from foo;
+--------+--------+
| my_col | my col |
+--------+--------+
| 1      | 2      |
| 2      | 3      |
| 3      | 4      |

However, using c1 as the alias for some reason fails:

select x as c1, x as c1 from foo;
Error during planning: Projections require unique expression names but the expression "foo.x AS c1" at position 0 and "foo.x AS c1" at position 1 have the same name. Consider aliasing ("AS") one of them.+--------+--------+

its failed as DF has projection uniqueness column name check.
first test didn't fail because it has different aliases

@alamb
Copy link
Contributor

alamb commented Jul 13, 2023

first test didn't fail because it has different aliases

Oh man, 🤦 -- I missed the _ and

@tv42
Copy link
Contributor

tv42 commented Feb 12, 2024

An extra motivation to get this right is that sqlite's sqllogictest suite has a lot of these. You'll get better test coverage if you support this.

@alamb
Copy link
Contributor

alamb commented Feb 12, 2024

I wonder if someone has time to file a ticket with the idea to re(use) sqlite's sqllogictest suite?

@Jefffrey
Copy link
Contributor

Jefffrey commented Apr 5, 2024

Another case is when selecting same value literals

DataFusion CLI v37.0.0select 1, 1;
Error during planning: Projections require unique expression names but the expression "Int64(1)" at position 0 and "Int64(1)" at position 1 have the same name. Consider aliasing ("AS") one of them.
❯

@comphead
Copy link
Contributor

comphead commented Apr 5, 2024

Yes,I was thinking the other day to allow query like that and check uniqueness from outer queries only

@tv42
Copy link
Contributor

tv42 commented Apr 16, 2024

@alamb I don't think you can directly run SQLite's test suite against just datafusion, there's a lot of CREATE INDEX etc going on.

I have an early-stage OLTP database project in the works using datafusion and that survives a decent fraction of sqllogictest. This issue is one of the remaining big limitations, along with AVG(DISTINCT) #2408 -- I've been filing bugs on test cases that failed due to datafusion, and they've largely been fixed.

@findepi
Copy link
Member

findepi commented Nov 18, 2024

This would be great item in the broader scope of #12723, which intents to make DataFusion Logical Plans be state of the art.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants