Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Writing Arrow IPC format panic within encode_dictionary() method due to access out of bound fields vector #975

Closed
artyyouth opened this issue May 3, 2022 · 3 comments
Labels
no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog

Comments

@artyyouth
Copy link

artyyouth commented May 3, 2022

It panic at this line trying to access index 0 while &field.fields is a 0 length vector.

I haven't dived too deep to figure out what should be fixed, but turns out it panics when I have a List<Utf8> as Array item (btw: schema definition is Field::new("labels", DataType::List(Box::new(Field::new("labels", DataType::Utf8, true))))

I suspect the above match array.data_type().to_physical_type() will always match on List, and when it recursively invokes encode_dictionary() method with the nested IpcField, it will panic eventually because the innermost IpcField holds an empty fields vector.

Seems like a bug? Not sure if it is similar to this one #830

@artyyouth
Copy link
Author

Turns out it's a typo in my code messed up the schema...sorry for reporting this wrong issue...

@jorgecarleitao
Copy link
Owner

Hey, thanks and no worries! I think we should still not panic and offer a better error message, so I would say we can keep this open and return a OutOfSpec error when this happens with a good description. What do you think?

@artyyouth
Copy link
Author

Actually I noticed I can still repro this out of bound error since previously I commented out all the List fields in my schema...

Anyway I probably will try to dig deeper in the weekend to see if I can have a minimal viable repro, thanks for following up, and feel free to reopen this issue if you like. I will also keep my findings posted here. Thanks! :)

@jorgecarleitao jorgecarleitao added the no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog label May 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog
Projects
None yet
Development

No branches or pull requests

2 participants