Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Fix incorrect searched CASE optimization #14349

Merged
merged 2 commits into from
Jan 30, 2025

Conversation

findepi
Copy link
Member

@findepi findepi commented Jan 29, 2025

There is an optimization for searched CASE where values are of boolean type. It was converting the expression like

CASE
    WHEN X THEN A
    WHEN Y THEN B
    ..
    [ ELSE D ]
END

into

(X AND A)
    OR (Y AND NOT X AND B)
    [ OR (NOT (X OR Y) AND D) ]

This had the following problems

  • does not work for nullable conditions. If X is nullable, we cannot use NOT (X) to compliment it. We need to use X IS DISTINCT FROM true
  • it does not work correctly when some conditions are nullable and other values are false. E.g. X=NULL, A=true, Y=NULL, B=true, D=false, the CASE should return false, but the boolean expression will simplify to (NULL AND ..) OR (NULL AND ..) OR (false) which is NULL, not false
    • thus we use X for truthness check of X, we need to test X IS NOT DISTINCT FROM true
  • it did not work correctly when default D is missing, but conditions do not evaluate to NULL. CASE's result should be NULL but was false.

This commit fixes that optimization.

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Jan 29, 2025
There is an optimization for searched CASE where values are of boolean
type. It was converting the expression like

    CASE
        WHEN X THEN A
        WHEN Y THEN B
        ..
        [ ELSE D ]
    END

into

    (X AND A)
        OR (Y AND NOT X AND B)
        [ OR (NOT (X OR Y) AND D) ]

This had the following problems

- does not work for nullable conditions. If X is nullable, we cannot use
  NOT (X) to compliment it. We need to use `X IS DISTINCT FROM true`
- it does not work correctly when some conditions are nullable and other
  values are false. E.g. X=NULL, A=true, Y=NULL, B=true, D=false, the
  CASE should return false, but the boolean expression will simplify to
  `(NULL AND ..) OR (NULL AND ..) OR (false)` which is NULL, not false
  - thus we use `X` for truthness check of `X`, we need to test `X IS
    NOT DISTINCT FROM true`
- it did not work correctly when default D is missing, but conditions
  do not evaluate to NULL. CASE's result should be NULL but was false.

This commit fixes that optimization.
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @findepi -- this looks good to me

datafusion/sqllogictest/test_files/case.slt Show resolved Hide resolved
datafusion/sqllogictest/test_files/case.slt Show resolved Hide resolved
@findepi findepi merged commit 11435de into apache:main Jan 30, 2025
49 checks passed
@findepi findepi deleted the findepi/case branch January 30, 2025 20:59
@findepi findepi changed the title Fix incorrect searched CASE optimization Core: Fix incorrect searched CASE optimization Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid query result when searched CASE has nullable condition and boolean result
2 participants