-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PARQUET-1517: [C++] Crypto package updates to match the final spec #3520
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you're fixing some bugs here. Can you explain those briefly in the PR description? Also, do we have some regression tests for this that would deserve updating?
cpp/src/parquet/util/crypto.cc
Outdated
@@ -196,10 +204,16 @@ int ctr_encrypt(const uint8_t* plaintext, int plaintext_len, uint8_t* key, int k | |||
|
|||
uint8_t iv[ctrIvLen]; | |||
memset(iv, 0, ctrIvLen); | |||
iv[ctrIvLen - 1] = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment explaining the initialization here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Parquet, an input to the CTR cipher is an encryption key, a 16-byte IV and a plaintext. IVs are comprised of a 12-byte nonce and a 4-byte initial counter field. The first 31 bits of the initial counter field are set to 0, the last bit is set to 1.
Will add a comment in the code too.
@pitrou , this PR is a part of developing a new Parquet feature - the columnar encryption. More specifically, here I update the crypto functions to match the encryption specification In the C++ Parquet, the encryption is comprised of #2555, this PR and potentially new ones we'll create in the future. We have regression tests for the Java Parquet encryption, will certainly add them to the C++ version. |
cpp/src/parquet/util/crypto.cc
Outdated
std::copy(bufferSizeArray, bufferSizeArray + bufferSizeLen, ciphertext); | ||
std::copy(iv, iv + ctrIvLen, ciphertext + bufferSizeLen); | ||
|
||
return bufferSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the bufferSize will have more 4 bytes for bufferSizeLen, then I think now we should return bufferSize + bufferSizeLen
? or the caller will have to add 4 bytes into bufferSize itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thamht4190 You are right, it should be + sizeLen (or rather, bufferSize should include this addition). I'll update the code.
@ggershinsky can you add a PR description that summarizes the changes you've made here, for the project changelog? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some stylistic comments. Since we're trying to flatten out the package and remove parquet/util
it might make sense to move this to parquet/encryption-internal.h/cc
cpp/src/parquet/util/crypto.cc
Outdated
constexpr int GCM_MODE = 0; | ||
constexpr int CTR_MODE = 1; | ||
constexpr int CTRIvLength = 16; | ||
constexpr int BufferSizeLength = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use style-conforming names for these constants in the form kCamelCase
?
cpp/src/parquet/util/crypto.cc
Outdated
explicit EvpCipher(int cipher, int key_len, int type) { | ||
ctx_ = nullptr; | ||
AesEncryptor::AesEncryptor(ParquetCipher::type alg_id, int key_len, bool metadata, | ||
std::shared_ptr<std::list<AesEncryptor*>> all_encryptors) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use std::list
instead of the more common std::vector
?
cpp/src/parquet/util/crypto.cc
Outdated
std::stringstream ss; | ||
ss << "Crypto algorithm " << alg_id << " is not supported"; | ||
throw ParquetException(ss.str()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally not in favor of class constructors being able to fail. This maintains the status quo but you might consider doing argument validation in a factory static constructor and passing in sanitized arguments into a private ctor
cpp/src/parquet/util/crypto.cc
Outdated
private: | ||
EVP_CIPHER_CTX* ctx_; | ||
}; | ||
int AesEncryptor::CiphertextSizeDelta() { return ciphertext_size_delta_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ciphertext_size_delta()
(since this is just accessing an attribute)
cpp/src/parquet/util/crypto.cc
Outdated
|
||
int gcm_encrypt(const uint8_t* plaintext, int plaintext_len, uint8_t* key, int key_len, | ||
uint8_t* aad, int aad_len, uint8_t* ciphertext) { | ||
int AesEncryptor::gcm_encrypt(const uint8_t* plaintext, int plaintext_len, uint8_t* key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take the opportunity to rename this as GcmEncrypt
?
cpp/src/parquet/util/crypto.h
Outdated
uint8_t* plaintext); | ||
class AesEncryptor { | ||
public: | ||
// Can serve one key length only. Possible values: 16, 24, 32 bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use ///
for comments (for the sake of doxygen)
cpp/src/parquet/util/crypto.h
Outdated
EVP_CIPHER_CTX_free(ctx_); | ||
ctx_ = NULLPTR; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put these implementations in the .cc file?
cpp/src/parquet/util/crypto.h
Outdated
} | ||
|
||
private: | ||
EVP_CIPHER_CTX* ctx_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you hide this in a PIMPL? If this is an internal header then this is not necessary
cpp/src/parquet/util/crypto.h
Outdated
} | ||
|
||
private: | ||
EVP_CIPHER_CTX* ctx_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question re using PIMPL
cpp/src/parquet/util/crypto.h
Outdated
@@ -18,29 +18,126 @@ | |||
#ifndef PARQUET_UTIL_CRYPTO_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is PARQUET_EXPORT needed on any functions in here? If might be clearer to move this file to the top level of the Parquet codebase and call it encryption-internal.h
if it is internal-only
We went with Revital over these comments, all of them make sense to us, we'll make the respective changes. |
4d51930
to
e784d9d
Compare
Please be sure to run clang-format-7 (using |
@wesm , we've addressed your comments. If you're ok with the current version, please merge it, so the staging branch can pick up these 2 new files, upon rebase. |
Thanks. Having a look now |
FYI, it's a little bit odd to do code reviews on your fork as it's pseudoprivate In the future, can you adopt a process that permits the code review to be seen more obviously by the community? If you are collaborating on a PR, you can add each other as contributors on your fork and then have the code review on the PR itself |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thank you!
An initial version of crypto package is merged. This PR updates the crypto code to
conform the signed off specification (wire protocol updates, signature tag creation, AAD support, etc)
improve performance by extending cipher lifecycle to file writing/reading - instead of creating cipher on each encrypt/decrypt operation