-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add ArrayAndElementAndOptionalIndex
for proper casting in array_position
#9233
Conversation
Ah, actually seeing that won't work in the case of the optional 3rd arg for array position. Not sure if there's something built-in, but will look around. Or maybe add a |
No built-in signature, maybe we need |
That'd definitely be a better name... I'll let this PR sit for a couple of days in case there's a better idea or compelling reason not to do this in the first place, then'll see what |
I think this is ready for a review. It adds |
array_and_element
for proper casting in array_positionArrayAndElementAndOptionalIndex
for proper casting in array_position
let first_two_types = ¤t_types[0..2]; | ||
let valid_types = array_append_or_prepend_valid_types(first_two_types, true)?; | ||
|
||
let valid_types_with_index = valid_types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can optionally append valid_types_with_index if the length is 3, otherwise, return here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Addressed in 5086f31.
@@ -130,6 +130,31 @@ fn get_valid_types( | |||
_ => Ok(vec![vec![]]), | |||
} | |||
} | |||
fn array_append_and_optional_index( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array_element_and_optional_index
seems a better name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in 5086f31
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tshauck and @jayzhan211 -- this PR is looking very close
@@ -2603,6 +2603,16 @@ select array_position(arrow_cast(make_array([1, 2, 3], [4, 5, 6], [5, 5, 5], [4, | |||
---- | |||
2 2 | |||
|
|||
query I | |||
SELECT array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, this test fails on main like this:
Error during planning: array_position received incompatible types: '[Int32, Int64]'.
1 | ||
|
||
query I | ||
SELECT array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 5, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please add a few more tests:
- For
LargeList
- Error cases (e.g. for types that are still not compatible like
array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 'foo')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback... LargeList
is added and works as expected. For array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 'foo')
, this actually doesn't fail right now on this branch and returns NULL:
❯ SELECT array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 'foo') IS NULL;
+----------------------------------------------------------------------------------------------+
| array_position(make_array(Int64(5),Int64(2),Int64(3),Int64(4),Int64(5)),Utf8("foo")) IS NULL |
+----------------------------------------------------------------------------------------------+
| true |
+----------------------------------------------------------------------------------------------+
1 row in set. Query took 0.004 seconds.
Because this PR is leveraging the append's casting functionality, this seems to be the current behavior on main for append:
arrow-datafusion/datafusion-cli main ➜ ./target/debug/datafusion-cli
DataFusion CLI v35.0.0
❯ SELECT array_append(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 'foo');
+------------------------------------------------------------------------------------+
| array_append(make_array(Int64(5),Int64(2),Int64(3),Int64(4),Int64(5)),Utf8("foo")) |
+------------------------------------------------------------------------------------+
| [5, 2, 3, 4, 5, foo] |
+------------------------------------------------------------------------------------+
1 row in set. Query took 0.028 seconds.
I guess as I user I'd think if the array_append case would work, I'd also expect array_position to work like that. FWIW duckdb works for both array_append and array_position([1,2,3], 'foo')
it just returns 0 rather than NULL.
Anyways, happy to go either way, just thought I'd bring it up before delving further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the second case, we return NULL which is compatible with Postgres behavior, so I think it is acceptable. You can just add the test case to make sure we got NULL
query I
select array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 'foo')
----
NULL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good... added in 4ec6289
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the tests and doing the background research
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tshauck and @jayzhan211 for the reviews 🙏
1 | ||
|
||
query I | ||
SELECT array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 5, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the tests and doing the background research
Great, thank you both! |
Which issue does this PR close?
Closes #9093
Rationale for this change
Currently throws an error when doing
SELECT array_position(arrow_cast([1, 2, 3, 4, 5], 'List(Int32)'), 5)
, and after this PR it won't.What changes are included in this PR?
Change recommended in #9093 (comment)
Are these changes tested?
Added an additional sql logic test that failed w/o the change, and doesn't fail w/ the change.
Are there any user-facing changes?
No