Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove join side from duckdb semi/anti join translation; codegen for semi/anti join isn't roundtrippable #2244

Closed
cpcloud opened this issue Sep 17, 2023 · 3 comments · Fixed by #2247
Assignees

Comments

@cpcloud
Copy link
Contributor

cpcloud commented Sep 17, 2023

Fully reproducible code snippet

In [28]: import sqlglot as sg

In [29]: sg.parse_one("SELECT * FROM t1 ANTI JOIN t2 ON t1.x = t2.x", read="spark").sql("spark")
Out[29]: 'SELECT * FROM t1 AS ANTI JOIN t2 ON t1.x = t2.x'

Looks like something is going wrong during the parse, as the ANTI is interpreted as a table alias.

Additionally, when using LEFT ANTI JOIN, there's no transformation into a NOT EXISTS query the join side should be removed from duckdb (perhaps others?) as duckdb doens't support LEFT SEMI/LEFT ANTI, only SEMI and ANTI.

In [30]: sg.parse_one("SELECT * FROM t1 LEFT ANTI JOIN t2 ON t1.x = t2.x", read="spark").sql("duckdb")
Out[30]: 'SELECT * FROM t1 LEFT ANTI JOIN t2 ON t1.x = t2.x'

Official Documentation
N/A

@georgesittas
Copy link
Collaborator

@cpcloud duckdb supports ANTI JOIN, but I think we need to get rid of the join "side" (LEFT):

D with t1(x) as (select 1), t2(x) as (select 2) select * from t1 anti join t2 on t1.x = t2.x;
┌───────┐
│   x   │
│ int32 │
├───────┤
│     1 │
└───────┘
D with t1(x) as (select 1), t2(x) as (select 2) select * from t1 left anti join t2 on t1.x = t2.x;
Error: Parser Error: syntax error at or near "anti"
LINE 1: ...x) as (select 2) select * from t1 left anti join t2 on t1.x = t2.x;
                                                  ^

@georgesittas georgesittas self-assigned this Sep 17, 2023
@cpcloud
Copy link
Contributor Author

cpcloud commented Sep 17, 2023

Oh, interesting, looks like SEMI is supported too!

SGTM!

@cpcloud
Copy link
Contributor Author

cpcloud commented Sep 17, 2023

I can confirm that a dialect (postgres) that doesn't support SEMI/ANTI is doing the right thing:

In [26]: sg.parse_one("SELECT * FROM t1 LEFT ANTI JOIN t2 ON t1.x = t2.x", read="spark").sql(
    ...:     "postgres"
    ...: )
Out[26]: 'SELECT * FROM t1 WHERE NOT EXISTS(SELECT 1 FROM t2 WHERE t1.x = t2.x)'

I will update the issue title and description

@cpcloud cpcloud changed the title semi/anti join generates incorrect code remove join side from duckdb semi/anti join translation Sep 17, 2023
@cpcloud cpcloud changed the title remove join side from duckdb semi/anti join translation remove join side from duckdb semi/anti join translation; codegen for semi/anti isn't roundtrippable Sep 17, 2023
@cpcloud cpcloud changed the title remove join side from duckdb semi/anti join translation; codegen for semi/anti isn't roundtrippable remove join side from duckdb semi/anti join translation; codegen for semi/anti join isn't roundtrippable Sep 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants