Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] Read string columns directly into STRING_VIEW arrays and cast to LARGE_STRING if necessary #43068

Open
1 of 2 tasks
felipecrv opened this issue Jun 26, 2024 · 3 comments

Comments

@felipecrv
Copy link
Contributor

felipecrv commented Jun 26, 2024

Describe the enhancement requested

This would fix two issues for the price of one:

  1. Reading from Parquet into schemas that use the new STRING_VIEW type
  2. Reading LARGE_STRING_ARRAY from Parquet ([C++] Parquet reader is unable to read LargeString columns #39682)

This issue also depends on:

Component(s)

C++, Parquet

@mapleFU
Copy link
Member

mapleFU commented Jun 26, 2024

Related: apache/arrow-rs#5530

This can also applying "zero-copy" here for non Delta string encoding

@felipecrv felipecrv changed the title [C++][Parquet] Read string columns directly into STRING_VIEW arrays and cast to LARGE_STRING_VIEW if necessary [C++][Parquet] Read string columns directly into STRING_VIEW arrays and cast to LARGE_STRING if necessary Jul 21, 2024
@zzl200012
Copy link

Is there any updates about this feature? We have lots of parquet files containing plain encoded binary data, I think the reading performance will be much better if we can read the binary columns directly into binary_view arrays while applying the zero-copy method which arrow-rs already has.

@mapleFU
Copy link
Member

mapleFU commented Jan 21, 2025

Currently not. I'm glad to review this part of code but currently I don't have enough bandwidth to working on this

This might need a minor refactor on BinaryRecordReader (

class BinaryRecordReader : virtual public RecordReader {
) and Decoder api

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants