Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet Modular decryption support #6637

Draft
wants to merge 53 commits into
base: main
Choose a base branch
from
Draft

Conversation

rok
Copy link
Member

@rok rok commented Oct 28, 2024

Which issue does this PR close?

This PR is based on branch and an internal patch and aims to provide basic modular decryption support. Partially closes #3511. We decided to split encryption work into a separate PR.

Rationale for this change

See #3511.

What changes are included in this PR?

This introduces AesGcmV1 cypher decryption to ArrowReaderMetadata and ParquetRecordBatchReader. Introduced classes and functions are tested on sample files from parquet-dataset.

Are there any user-facing changes?

Several new classes and method parameters are introduced. If project is compiled without encryption flag changes are not breaking. If encryption flag is on some methods and constructors (e.g. ParquetMetaData::new) will require new parameters which would be a breaking change.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Oct 28, 2024
@rok
Copy link
Member Author

rok commented Oct 28, 2024

Currently this is a rough rebase of work done by @ggershinsky. As ParquetMetaDataReader is now available some refactoring will be required.

@etseidl
Copy link
Contributor

etseidl commented Oct 28, 2024

As ParquetMetaDataReader is now available some refactoring will be required.

@rok let me know if you want any help shoehorning this into ParquetMetaDataReader.

@brainslush
Copy link

Is there any help, input or contribution needed here?

@rok
Copy link
Member Author

rok commented Nov 21, 2024

Thanks for the offer @etseidl & @brainslush! I'm making some progress and would definitely appreciate a review! I'll ping once I push.

@rok rok force-pushed the decryption-basics-fork branch 2 times, most recently from fe488b3 to d263510 Compare November 23, 2024 23:06
@rok
Copy link
Member Author

rok commented Dec 4, 2024

As ParquetMetaDataReader is now available some refactoring will be required.

@rok let me know if you want any help shoehorning this into ParquetMetaDataReader.

@etseidl could you please do a quick pass to say if this makes sense in respect to ParquetMetaDataReader?
I'll continue with data decryption.

Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only looking at the metadata bits for now...looks good to me so far. Just a few minor nits. Thanks @rok!

parquet/src/file/footer.rs Show resolved Hide resolved
/// by the [Parquet Spec].
///
/// [Parquet Spec]: https://github.com/apache/parquet-format#metadata
#[deprecated(since = "53.1.0", note = "Use ParquetMetaDataReader::decode_metadata")]
pub fn decode_metadata(buf: &[u8]) -> Result<ParquetMetaData> {
ParquetMetaDataReader::decode_metadata(buf)
pub fn decode_metadata(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should be updating a deprecated function. If encryption is desired I'd say force use of the new API so we don't have to maintain this one. Just pass None to ParquetMetaDataReader::decode_metadata.

&mut fetch,
file_size,
self.get_prefetch_size(),
self.file_decryption_properties.clone(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor nit: I understand that file_decryption_properties needs to be cloned eventually...just wondering if we could pass references down into decode_metadata and do the clone there where it's more obviously needed.

@rok rok force-pushed the decryption-basics-fork branch from f90d8b4 to 29d55eb Compare December 16, 2024 23:51
@rok rok force-pushed the decryption-basics-fork branch 2 times, most recently from 7ac53ba to deedba9 Compare January 21, 2025 20:28
@rok rok force-pushed the decryption-basics-fork branch from deedba9 to 951f2fa Compare January 21, 2025 20:35
@rok rok changed the title Parquet Modular Encryption support Parquet Modular decryption support Jan 21, 2025
@rok rok force-pushed the decryption-basics-fork branch from 6f3f0be to 6acb984 Compare January 22, 2025 12:59
@rok rok force-pushed the decryption-basics-fork branch from f6b9e88 to 23375d1 Compare January 23, 2025 18:17
@adamreeve adamreeve force-pushed the decryption-basics-fork branch 2 times, most recently from ccdac56 to 7f94e39 Compare January 24, 2025 02:34
@adamreeve adamreeve force-pushed the decryption-basics-fork branch from 7f94e39 to 177d826 Compare January 24, 2025 02:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parquet Modular Encryption support
5 participants