You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
col = null expression evaluation throws a runtime error when getting evaluated against statistics array, which resulted in incorrect true result when the stats has null count set to 0.
The other problem is col = null expression is converted into col_min <= NULL AND NULL <= col_max predicate expression. I believe we should be handling null as a special case and return an expression that checks against null count column instead.
The test case asserts that results for both row groups should return true, while them should both be false instead because both row groups have null count set to 0.
Expected behavior
col = null row group should be evaluated by taking row group null count stats into account.
The text was updated successfully, but these errors were encountered:
It should be noted that this is an 'optimization' bug rather than a correctness bug -- in the sense that returning false means "don't filter the row group" and returning true means "do filter (aka skip) the row group"
Describe the bug
col = null
expression evaluation throws a runtime error when getting evaluated against statistics array, which resulted in incorrecttrue
result when the stats has null count set to 0.The other problem is
col = null
expression is converted intocol_min <= NULL AND NULL <= col_max
predicate expression. I believe we should be handling null as a special case and return an expression that checks against null count column instead.To Reproduce
See our test cases at: https://github.com/apache/arrow-datafusion/blob/f027e5f4d9a44ad9cc879c133abc913f78fa76f0/datafusion/src/physical_plan/file_format/parquet.rs#L722-L763
The test case asserts that results for both row groups should return
true
, while them should both befalse
instead because both row groups have null count set to 0.Expected behavior
col = null
row group should be evaluated by taking row group null count stats into account.The text was updated successfully, but these errors were encountered: