-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deoxys #311
Deoxys #311
Conversation
Unrelated to this PR, I was actually wondering about if exposing a more fine-grained API from the Exposing the encryption round function (as a very much "hazmat" API) in a way that's portable across It's a bit trickier with the software implementation which uses a slightly modified SBox which omits some bitwise NOTs which are moved to the key schedule for efficiency reasons, however I think we could still build a common abstraction across all three. Decryption is a bit trickier as different approaches are employed by the different backends (i.e. inverse cipher versus equivalent inverse cipher, as described in the Rijndael paper and FIPS 197), however these approaches are duals of each other and one can be used to implement the other. I'd say it's something we should really address before merging. |
Agreed! A lot of Deoxys strength relies on that it relies on the AES round function which already has fast and safe implementations, so duplicating the codebase for that part wouldn't be very efficient or safe. On another note, since this might take some time, do you think it would be a good idea to reserve the |
Snagged it 😉 |
I opened RustCrypto/block-ciphers#252 to track exposing the AES round function. |
@zer0x64 I think the trickiest thing about RustCrypto/block-ciphers#252 will be handling decryption. Namely Intel's AES-NI and the ARMv8 AES intrinsics take somewhat different approaches: https://blog.michaelbrase.com/2018/05/08/emulating-x86-aes-intrinsics-on-armv8-a/ I'd be curious to look at some other hardware-accelerated implementations of Deoxys and see what they do. |
The only hardware-accelerated Deoxys-II implementation I'm aware of is the Rust implementation by Oasis Labs(maybe their Go implementation is too), which only supports Deoxys-II on x86: https://github.com/oasisprotocol/deoxysii-rust Technically, Deoxys-II never requires a Deoxys-BC decryption because of the stream-cipher like design of the mode. However, I "think" that Deoxys-I actually requires decryption, but that's not something I can confirm without actually implementing it because that's the only surefire way to make sure I completely understand it. Note that(for x86) only "AESENC" and maybe "AESDEC" are used in Deoxys. There is no "last" round like in AES, the key schedule is different and there is no need to directly call inverse MixCollumns. Note that, once again, in the case of Deoxys, AddRoundKey can be either at the beggining or at the end of the round as it is the first and the last instruction of the entire sequence. Basically, you either do a manual AddRoundKey and then the rounds ending with AddRoundKey OR you do rounds that starts with AddRoundKey and do a manual AddRoundKey at the end. EDIT: Decryption is required for Deoxys-I |
Sounds good. Encryption-wise the ARMv8 instructions just split out Mix Columns/Inverse Mix Columns from the rest of the round function, and if you want control of when the round key XOR takes place, you can always pass a key of zeros to the hardware intrinsic. So maybe a reasonable initial API would be one that looks like Intel's |
So as far as I can see, Deoxys CAN be done cleanly on intrinsics from both platforms (x86: XOR -> loop(AESENC), ARM: loop(AESE -> AESMC) -> XOR), however I don't really see a good way to abstract these into a single platform-independant method that can be used from Deoxys |
I didn't think about the the null key! |
Deoxys has been designed with AES-NI in mind, and AES-NI uses the "Equivalent Inverse Cipher", so there must be a way to tweak the key schedule to work with it, although I might need to work a bit to get it to work. From the article, I also saw that both x86 and ARM AES instruction sets uses this approach, so there shouldn't be any problem on that side. EDIT: I think all that's needed is to expose AESIMC, the ARM and the software equivalent(InverseMixCollumns) and apply the transformation to the round keys before using them, which should be pretty easy to implement on the Deoxys side. |
So this is the tricky question for decryption: should the FIPS 197 (Appendix C.3) provides test vectors for:
Should we pick one of inverse cipher of equivalent inverse cipher to support? Or try to support both? |
I think I see the point: Should the round key inversion(AESIMC) be done inside the Personally I think that exposing an "Inverse Cipher" API would be better for a couple of reasons. It feels more natural, AKA using the encrypt and decrypt API would be the same, also it would not require to expose AESIMC to other crates, and AFAIK this wouldn't hurt performance as the intrinsics only supports Equivalent Inverse Cipher anyway and the inversion step would be done only once per round, regardless of which API we choose. |
Deoxys-I implemented and tested 🎉 Quick question regarding the API: Deoxys-II uses 15 bytes in the spec and internally in the cipher. However, 15-bytes nonces is a pretty non-standard size and even the the provided test vectors uses 16-bytes nonce, even if the least significant byte is never used. Should I make it so a 16 byte is required even though a byte isn't used or should I require a 15 byte nonce? (I think 15 would be better as it's a bit more flexible for the user and more efficient) |
Starting with a 15-byte nonce sounds good to me. If 16-bytes is more helpful in practice (e.g. generic bounds) we can explore that later. |
I opened a preliminary PR with a raw round function API here, gated under a new It only supports the hardware intrinsic backends for now, but provides equivalent semantics between Intel AES-NI and the ARMv8 Cryptography Extensions, as tested against some of the FIPS 197 round function test vectors.
|
Codecov Report
@@ Coverage Diff @@
## master #311 +/- ##
==========================================
+ Coverage 84.47% 89.58% +5.10%
==========================================
Files 31 38 +7
Lines 1140 1977 +837
==========================================
+ Hits 963 1771 +808
- Misses 177 206 +29
Continue to review full report at Codecov.
|
I'll fix the clippy warnings and try to see if the exposed round functions does the trick. |
@tarcieri I just got it working on my side, I also need the Inverse MixColumns instruction to invert the keys before sending it to the round function on decryption. Right now it's working on my side but using the software implementation. EDIT: I lied, the fact that I got it working was a false positive. However, I just got it working and I indeed need access to AESIMC and the ARMv8 equivalent. One thing I didn't expect is that I still need a "regular" MixColumn, once per block(not per round). It's not possible to do it hardware with x86, it is with ARMv8. Maybe that would help once we expose the software round implementation in the |
Aah, ok. Maybe I should rename the module from As it were, there are hardware instructions for InvMixColumns on both x86 ( |
I just pushed the changes to the round encryption/decryption to use the AES crate(although it's imported via a relative path so this will break everywhere expect my own setup for now). I added an |
@zer0x64 if you update the |
Great! I'll update it when I can and also update it so it pull the |
@zer0x64 looks like it needs a rebase |
Done! So I'll keep the What's left to do before removing the [WIP] tag is to cleanup |
I've removed |
I've tried to run it in WASM and I'm getting a bunch of unresolved modules from the
My guess is that those would also be unresolved on any non-x86/ARM platforms too |
Yep. I'm working on parallel encrypt/decrypt APIs in |
To reproduce, simply |
Aah, yeah the I need to backfill CI to check it on platforms without CPU intrinsics (it previously had no software backend, so it wasn't possible) |
I just landed a few PRs in the Let me know if that all looks good and I can cut another release. |
Also feel free to mark as no-longer-draft / ready for review when you think it's ready. Looking pretty good now to me. |
FYI: I cut an |
Great! I'll check if everything's working on my side later today and how hard it is to make use of the parallel API, then I'll remove the WIP/draft tag! |
Deoxys-II-256 tested and working in WASM |
I've checked a bit and using the |
Seems fine to omit for now and address as follow-up work |
Looking good now! I'll do a more detailed review when I have some time. |
Went ahead and merged. There are a few small things I might like to change, but I can do them as follow-ups when I have time. |
This is an initial implementation of the Deoxys AEAD. The Deoxys-II variant has been selected as the first choice for defense in-depth during the CAESAR competition.
A couple of things I checked:
tests/
folder)[no_std]
There is still improvements to be done on the performance side, but this seems good enough for a first release.