Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deoxys #311

Merged
merged 24 commits into from
Jun 20, 2021
Merged

Deoxys #311

merged 24 commits into from
Jun 20, 2021

Conversation

zer0x64
Copy link
Contributor

@zer0x64 zer0x64 commented May 15, 2021

This is an initial implementation of the Deoxys AEAD. The Deoxys-II variant has been selected as the first choice for defense in-depth during the CAESAR competition.

A couple of things I checked:

  • All the official test vectors passes (see the tests/ folder)
  • The crate is [no_std]
  • The crate compiles and run in WASM.

There is still improvements to be done on the performance side, but this seems good enough for a first release.

@zer0x64 zer0x64 changed the title Deoxys [WIP] Deoxys May 15, 2021
@tarcieri
Copy link
Member

tarcieri commented May 15, 2021

Deoxys uses the AES round as the primitive for its operation, however there doesn't seem to be a way to use the aes crate for that purpose as the round functions aren't exposed. Right now the round function is reimplemented in 100% software in the aes_ref module, but this is slow, doesn't use AES-NI when possible and is probably full of side-channel attacks. I think the best person to ask here is @tarcieri ?

Unrelated to this PR, I was actually wondering about if exposing a more fine-grained API from the aes crate would be helpful the other day, especially as I'm currently working on adding support for the ARMv8 cryptography extensions to the aes crate.

Exposing the encryption round function (as a very much "hazmat" API) in a way that's portable across x86/x86-64 and aarch64 targets would actually be pretty easy, or at least could be easy with some refactoring.

It's a bit trickier with the software implementation which uses a slightly modified SBox which omits some bitwise NOTs which are moved to the key schedule for efficiency reasons, however I think we could still build a common abstraction across all three.

Decryption is a bit trickier as different approaches are employed by the different backends (i.e. inverse cipher versus equivalent inverse cipher, as described in the Rijndael paper and FIPS 197), however these approaches are duals of each other and one can be used to implement the other.

I'd say it's something we should really address before merging.

@zer0x64
Copy link
Contributor Author

zer0x64 commented May 15, 2021

I'd say it's something we should really address before merging.

Agreed! A lot of Deoxys strength relies on that it relies on the AES round function which already has fast and safe implementations, so duplicating the codebase for that part wouldn't be very efficient or safe.

On another note, since this might take some time, do you think it would be a good idea to reserve the deoxys name under RustCrypto on crates.io?

@tarcieri
Copy link
Member

On another note, since this might take some time, do you think it would be a good idea to reserve the deoxys name under RustCrypto on crates.io?

Snagged it 😉

@tarcieri
Copy link
Member

I opened RustCrypto/block-ciphers#252 to track exposing the AES round function.

@tarcieri
Copy link
Member

@zer0x64 I think the trickiest thing about RustCrypto/block-ciphers#252 will be handling decryption. Namely Intel's AES-NI and the ARMv8 AES intrinsics take somewhat different approaches:

https://blog.michaelbrase.com/2018/05/08/emulating-x86-aes-intrinsics-on-armv8-a/

I'd be curious to look at some other hardware-accelerated implementations of Deoxys and see what they do.

@zer0x64
Copy link
Contributor Author

zer0x64 commented May 15, 2021

The only hardware-accelerated Deoxys-II implementation I'm aware of is the Rust implementation by Oasis Labs(maybe their Go implementation is too), which only supports Deoxys-II on x86:

https://github.com/oasisprotocol/deoxysii-rust

Technically, Deoxys-II never requires a Deoxys-BC decryption because of the stream-cipher like design of the mode. However, I "think" that Deoxys-I actually requires decryption, but that's not something I can confirm without actually implementing it because that's the only surefire way to make sure I completely understand it.

Note that(for x86) only "AESENC" and maybe "AESDEC" are used in Deoxys. There is no "last" round like in AES, the key schedule is different and there is no need to directly call inverse MixCollumns.

Note that, once again, in the case of Deoxys, AddRoundKey can be either at the beggining or at the end of the round as it is the first and the last instruction of the entire sequence. Basically, you either do a manual AddRoundKey and then the rounds ending with AddRoundKey OR you do rounds that starts with AddRoundKey and do a manual AddRoundKey at the end.

EDIT: Decryption is required for Deoxys-I

@tarcieri
Copy link
Member

Sounds good.

Encryption-wise the ARMv8 instructions just split out Mix Columns/Inverse Mix Columns from the rest of the round function, and if you want control of when the round key XOR takes place, you can always pass a key of zeros to the hardware intrinsic.

So maybe a reasonable initial API would be one that looks like Intel's AESENC, i.e. the entire encryption round function, including Mix Columns, which accepts a round key and block to process as inputs.

@zer0x64
Copy link
Contributor Author

zer0x64 commented May 15, 2021

So as far as I can see, Deoxys CAN be done cleanly on intrinsics from both platforms (x86: XOR -> loop(AESENC), ARM: loop(AESE -> AESMC) -> XOR), however I don't really see a good way to abstract these into a single platform-independant method that can be used from Deoxys

@zer0x64
Copy link
Contributor Author

zer0x64 commented May 15, 2021

Encryption-wise the ARMv8 instructions just split out Mix Columns/Inverse Mix Columns from the rest of the round function, and if you want control of when the round key XOR takes place, you can always pass a key of zeros to the hardware intrinsic.

I didn't think about the the null key!

@zer0x64
Copy link
Contributor Author

zer0x64 commented May 15, 2021

Deoxys has been designed with AES-NI in mind, and AES-NI uses the "Equivalent Inverse Cipher", so there must be a way to tweak the key schedule to work with it, although I might need to work a bit to get it to work. From the article, I also saw that both x86 and ARM AES instruction sets uses this approach, so there shouldn't be any problem on that side.

EDIT: I think all that's needed is to expose AESIMC, the ARM and the software equivalent(InverseMixCollumns) and apply the transformation to the round keys before using them, which should be pretty easy to implement on the Deoxys side.

@tarcieri
Copy link
Member

I think all that's needed is to expose AESIMC, the ARM and the software equivalent(InverseMixCollumns) and apply the transformation to the round keys before using them, which should be pretty easy to implement on the Deoxys side.

So this is the tricky question for decryption: should the aes crate expose an equivalent inverse cipher API?

FIPS 197 (Appendix C.3) provides test vectors for:

  • CIPHER (ENCRYPT)
  • INVERSE CIPHER (DECRYPT)
  • EQUIVALENT INVERSE CIPHER (DECRYPT)

Should we pick one of inverse cipher of equivalent inverse cipher to support? Or try to support both?

@zer0x64
Copy link
Contributor Author

zer0x64 commented May 15, 2021

I think I see the point: Should the round key inversion(AESIMC) be done inside the aes crate or outside of it?

Personally I think that exposing an "Inverse Cipher" API would be better for a couple of reasons. It feels more natural, AKA using the encrypt and decrypt API would be the same, also it would not require to expose AESIMC to other crates, and AFAIK this wouldn't hurt performance as the intrinsics only supports Equivalent Inverse Cipher anyway and the inversion step would be done only once per round, regardless of which API we choose.

@zer0x64
Copy link
Contributor Author

zer0x64 commented May 16, 2021

Deoxys-I implemented and tested 🎉

Quick question regarding the API:

Deoxys-II uses 15 bytes in the spec and internally in the cipher. However, 15-bytes nonces is a pretty non-standard size and even the the provided test vectors uses 16-bytes nonce, even if the least significant byte is never used. Should I make it so a 16 byte is required even though a byte isn't used or should I require a 15 byte nonce? (I think 15 would be better as it's a bit more flexible for the user and more efficient)

@tarcieri
Copy link
Member

Starting with a 15-byte nonce sounds good to me.

If 16-bytes is more helpful in practice (e.g. generic bounds) we can explore that later.

@tarcieri
Copy link
Member

tarcieri commented May 17, 2021

I opened a preliminary PR with a raw round function API here, gated under a new hazmat feature: RustCrypto/block-ciphers#257

It only supports the hardware intrinsic backends for now, but provides equivalent semantics between Intel AES-NI and the ARMv8 Cryptography Extensions, as tested against some of the FIPS 197 round function test vectors.

TODO: equivalent inverse cipher API Edit: done.

@codecov-commenter
Copy link

codecov-commenter commented May 17, 2021

Codecov Report

Merging #311 (8ba77a3) into master (f371861) will increase coverage by 5.10%.
The diff coverage is 96.53%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #311      +/-   ##
==========================================
+ Coverage   84.47%   89.58%   +5.10%     
==========================================
  Files          31       38       +7     
  Lines        1140     1977     +837     
==========================================
+ Hits          963     1771     +808     
- Misses        177      206      +29     
Impacted Files Coverage Δ
deoxys/src/lib.rs 86.20% <86.20%> (ø)
deoxys/src/modes.rs 89.38% <89.38%> (ø)
deoxys/src/deoxys_bc.rs 89.47% <89.47%> (ø)
deoxys/tests/deoxys_i_128.rs 100.00% <100.00%> (ø)
deoxys/tests/deoxys_i_256.rs 100.00% <100.00%> (ø)
deoxys/tests/deoxys_ii_128.rs 100.00% <100.00%> (ø)
deoxys/tests/deoxys_ii_256.rs 100.00% <100.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f371861...8ba77a3. Read the comment docs.

@zer0x64
Copy link
Contributor Author

zer0x64 commented May 18, 2021

I'll fix the clippy warnings and try to see if the exposed round functions does the trick.

@zer0x64
Copy link
Contributor Author

zer0x64 commented May 18, 2021

@tarcieri I just got it working on my side, I also need the Inverse MixColumns instruction to invert the keys before sending it to the round function on decryption. Right now it's working on my side but using the software implementation.

EDIT: I lied, the fact that I got it working was a false positive. However, I just got it working and I indeed need access to AESIMC and the ARMv8 equivalent. One thing I didn't expect is that I still need a "regular" MixColumn, once per block(not per round). It's not possible to do it hardware with x86, it is with ARMv8. Maybe that would help once we expose the software round implementation in the aes crate, but in the meantime I'll simply use my implementation in the software fallback.

@tarcieri
Copy link
Member

Aah, ok. Maybe I should rename the module from round => hazmat.

As it were, there are hardware instructions for InvMixColumns on both x86 (_mm_aesimc_si128) and ARMv8 (vaesimcq_u8)

@zer0x64
Copy link
Contributor Author

zer0x64 commented May 18, 2021

I just pushed the changes to the round encryption/decryption to use the AES crate(although it's imported via a relative path so this will break everywhere expect my own setup for now). I added an unsafe block for now to use x86 AESIMC, but I'll change it to use the aes crate when available. I also made used a cfg gate to fallback on my software implementation.

@tarcieri
Copy link
Member

@zer0x64 if you update the aes crate there is now an inv_mix_columns function in the (newly renamed) hazmat module

@zer0x64
Copy link
Contributor Author

zer0x64 commented May 18, 2021

Great! I'll update it when I can and also update it so it pull the aes crate from git instead of locally

@tarcieri
Copy link
Member

@zer0x64 looks like it needs a rebase

@zer0x64
Copy link
Contributor Author

zer0x64 commented May 18, 2021

Done!

So I'll keep the aes_ref module for now as a software fallback and for the MixCollumn step(the one that doesn't have a dedicated x86 instruction). I'll eventually remove them when they will be available from the aes crate ( I'll open the issue after the merge).

What's left to do before removing the [WIP] tag is to cleanup mode.rs and add some benchmarks to see if the performance is good enough for a first release.

@zer0x64
Copy link
Contributor Author

zer0x64 commented Jun 1, 2021

I've removed aes_ref and ran the tests with both target-feature=-aes and target-feature=+aes (although I haven't tested with force-soft, but I might simply test it in WASM to test the software fallback). I guess what's left for a first release is simply to wait for a new aes version?

@zer0x64
Copy link
Contributor Author

zer0x64 commented Jun 1, 2021

I've tried to run it in WASM and I'm getting a bunch of unresolved modules from the aes crate. More specifically:

cpufeatures
aes_intrinsics
intrinsics

My guess is that those would also be unresolved on any non-x86/ARM platforms too

@tarcieri
Copy link
Member

tarcieri commented Jun 1, 2021

Yep. I'm working on parallel encrypt/decrypt APIs in hazmat, then I'll cut a release.

@zer0x64
Copy link
Contributor Author

zer0x64 commented Jun 1, 2021

To reproduce, simply cargo build --target wasm32-unknown-unknown

@tarcieri
Copy link
Member

tarcieri commented Jun 1, 2021

Aah, yeah the hazmat API is probably not gating things correctly yet.

I need to backfill CI to check it on platforms without CPU intrinsics (it previously had no software backend, so it wasn't possible)

@tarcieri
Copy link
Member

tarcieri commented Jun 1, 2021

I just landed a few PRs in the aes crate, including RustCrypto/block-ciphers#269 which adds parallel round APIs that operate over aes::ParBlocks, and RustCrypto/block-ciphers#270 which should address the build failures.

Let me know if that all looks good and I can cut another release.

@tarcieri
Copy link
Member

tarcieri commented Jun 1, 2021

Also feel free to mark as no-longer-draft / ready for review when you think it's ready. Looking pretty good now to me.

@tarcieri tarcieri requested a review from newpavlov June 1, 2021 04:07
@tarcieri
Copy link
Member

tarcieri commented Jun 1, 2021

FYI: I cut an aes v0.7.4 release which includes software fallback and the parallel cipher_round_par and equiv_inv_cipher_round_par parallel APIs.

@zer0x64
Copy link
Contributor Author

zer0x64 commented Jun 1, 2021

Great! I'll check if everything's working on my side later today and how hard it is to make use of the parallel API, then I'll remove the WIP/draft tag!

@zer0x64
Copy link
Contributor Author

zer0x64 commented Jun 1, 2021

Deoxys-II-256 tested and working in WASM

@zer0x64
Copy link
Contributor Author

zer0x64 commented Jun 1, 2021

I've checked a bit and using the aes parallel block api will require a bit of work, mostly because of the way the memory is currently layed out for the round keys. Should I try to get it working before merging or this can be done later with another PR?

@tarcieri
Copy link
Member

tarcieri commented Jun 1, 2021

Seems fine to omit for now and address as follow-up work

@zer0x64 zer0x64 marked this pull request as ready for review June 1, 2021 22:16
@zer0x64 zer0x64 changed the title [WIP] Deoxys Deoxys Jun 1, 2021
@tarcieri
Copy link
Member

tarcieri commented Jun 1, 2021

Looking good now! I'll do a more detailed review when I have some time.

@tarcieri tarcieri merged commit c8fe08f into RustCrypto:master Jun 20, 2021
@tarcieri
Copy link
Member

Went ahead and merged. There are a few small things I might like to change, but I can do them as follow-ups when I have time.

@zer0x64 zer0x64 mentioned this pull request Jul 1, 2021
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants