Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve unnest_generic_list handling of null list #9975

Merged
merged 3 commits into from
Apr 8, 2024

Conversation

jonahgao
Copy link
Member

@jonahgao jonahgao commented Apr 6, 2024

Which issue does this PR close?

Closes #9932.

Rationale for this change

In a GenericListArray, the data in the values corresponding to a null list may not be empty; it can be any data.

pub struct GenericListArray<OffsetSize: OffsetSizeTrait> {
    data_type: DataType,
    nulls: Option<NullBuffer>,
    values: ArrayRef,
    value_offsets: OffsetBuffer<OffsetSize>,
}

For example, in the description of this document, the value of the null list is ?, it's not empty.

In this situation, merely checking the values is unsafe, we must obtain valid values based on the value_offsets.

What changes are included in this PR?

  • Fix bug in unnest_generic_list.

Are these changes tested?

Yes

Are there any user-facing changes?

No

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks correct to me 👍

Comment on lines +375 to +376
if list_array.is_null(row) {
if options.preserve_nulls {
Copy link
Contributor

@Jefffrey Jefffrey Apr 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit surprised that clippy didn't suggest collapsing these together? Either way, would it be slightly more efficient to put the options.preserve_nulls check first since this would never change per iteration?

Edit: ah nvm, it's because the outer one is an if else 😅

Comment on lines +609 to +610
// Create a ListArray with the following list values:
// [A, B, C], [], NULL, [D], NULL, [NULL, F]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests 👍

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jayzhan211
Copy link
Contributor

Nice fix! Thanks @jonahgao and @Jefffrey

@jayzhan211 jayzhan211 merged commit 215f30f into apache:main Apr 8, 2024
24 checks passed
@jonahgao jonahgao deleted the unnest_null branch April 8, 2024 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unnest doesn't take into account null values
3 participants