Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let arrow schemas specify the ENUM type from parquet schemas #7050

Open
the80srobot opened this issue Jan 31, 2025 · 2 comments
Open

Let arrow schemas specify the ENUM type from parquet schemas #7050

the80srobot opened this issue Jan 31, 2025 · 2 comments
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@the80srobot
Copy link

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I am exporting data from an application using an Arrow schema - the data is then written into a parquet file. Some fields are logically enums, and I would like to export them as the ENUM type in Parquet, but there is currently no way to specify that.

Describe the solution you'd like

In parquet, ENUM is an annotation of a BYTE_ARRAY, similar to STRING. Unlike String, it's parametrized with the allowable values.

It sounds like it might be an extension type, as in #5822 .

@the80srobot the80srobot added the enhancement Any new improvement worthy of a entry in the changelog label Jan 31, 2025
@tustvold
Copy link
Contributor

tustvold commented Jan 31, 2025

Does pyarrow or similar support this, as that might provide inspiration for how to support this?

Otherwise #4702 is probably the closest thing we have to this, in terms of supporting parquet specific metadata.

@jhorstmann
Copy link
Contributor

Unlike String, it's parametrized with the allowable values.

I don't think that is the case inside the parquet format itself, but it might be a useful part of the api to ensure the values get dictionary encoded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

3 participants