Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use function reference to get cpu capability vector at the C-level #856

Merged

Conversation

torben-hansen
Copy link
Contributor

@torben-hansen torben-hansen commented Mar 8, 2023

Issues:

CryptoAlg-1577

Description of changes:

The goal of this PR is to close a performance gap between FIPS static and FIPS shared build. The latter is more performant in some cases.

Background

OPENSSL_ia32cap_P is a global, initialised, mutable variable. And hence, will typically go to .bss. However, the location of the .bss segment is not known before program load-time. Hence, dereferences of OPENSSL_ia32cap_P in code must be filled in by the loader. This modifies .text before it's mapped into memory, in turn, would make any hash over (part of) .text unpredictable. This is bad for FIPS, which requires a deterministic hash over part of .text to implement the integrity check.

All of this is already known though and there already exist an implement solution! We are only concerned about the static FIPS build. The shared build uses a different technique that doesn't require in-line modifications of textural assembly code, and therefore is not subject to the same performance degradation explained below.

The FIPS static build fix-up dereferences of OPENSSL_ia32cap_P in a pretty straightforward way:

Note, that the latter step requires an addition. The addition might modify carry-flags. This is problematic, because the in-line modification of the assembly performed by the delocator script is not really context-aware. It just simply uses the destination register. But injecting an instruction that potentially modifies the carry-flag could break soundness/correctness because the subsequent assembly code was not written with this in mind.

Therefore, the delocator script push the cpu flags onto the stack and pops them after the injected code: https://github.com/aws/aws-lc/blob/main/util/fipstools/delocate/delocate.go#L1367-L1369. This restores the carry-flag state to what it was before the injected code ensuring soundness/correctness.

Now to the actual issue: It was found that this push/pop of carry-flags introduced a small latency, that when accumulated affects performance. For example, AES-256-XTS init and encrypt (256 bytes) can perform 4706548 invocations per second for the fips static build, but 6735574 invocations per second for the fips shared build (c6i.4xlarge); in other words, fips shared build does ~43% more invocations per second than the fips static build. Performance differences can be seen for various other algorithms e.g. P384 ops, SHA-256, AEAD-AES-128-GCM, etc.

Solution

This is the first PR out of two PRs that bridge the performance gap.

This PR use a function reference instead of directly dereferences the global variable at the C-level. Some instances of dereferencing the variable directly were added in https://github.com/aws/aws-lc/pull/330/files, removing the function reference. Others seem to just have been dereferencing the global variable always. Make this consistent and only use the functional reference at the C-level. This avoids the push/pop of the carry-flags. And performance benchmark shows that this recovers some of the performance gap.

Next PR will tackle the assembly level code, which can't directly use the functional reference: again because the delocator script is not context-aware, so adding a function call, will modify state that later assembly code has not been written to account for. For example, the assembly code might use registers that the function call by definition assumes are callee owned.

See CryptoAlg-1577 for performance gain information.

Testing:

CI passes over all relevant compilers for fips build. CryptoAlg-1577 has information on benchmarks.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and
the ISC license.

@torben-hansen torben-hansen force-pushed the use_function_to_reference_cpucap branch from 57a6164 to 80ccc1e Compare March 15, 2023 15:21
@torben-hansen torben-hansen changed the title [DRAFT] Bring back cpu capability trampoline function on the C-level for static fips build Use function reference to get cpu capability vector at the C-level for FIPS static build Mar 16, 2023
@torben-hansen torben-hansen force-pushed the use_function_to_reference_cpucap branch from 1cc7b83 to 73b816e Compare March 16, 2023 15:03
@torben-hansen torben-hansen marked this pull request as ready for review March 16, 2023 15:04
@torben-hansen torben-hansen requested a review from dkostic March 16, 2023 15:04
Taffer
Taffer previously approved these changes Mar 16, 2023
Copy link
Contributor

@Taffer Taffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me. 👍

@torben-hansen
Copy link
Contributor Author

Missed some:

grep --exclude=\*.S --exclude=\*.pl -r "OPENSSL_ia32cap_P" ./crypto                                         8:35
./crypto/fipsmodule/cpucap/cpucap.c:HIDDEN uint32_t OPENSSL_ia32cap_P[4] = {0};
./crypto/fipsmodule/cpucap/internal.h:// OPENSSL_ia32cap_P contains the Intel CPUID bits when running on an x86 or
./crypto/fipsmodule/cpucap/internal.h:extern uint32_t OPENSSL_ia32cap_P[4];
./crypto/fipsmodule/cpucap/internal.h:  return OPENSSL_ia32cap_P;
./crypto/fipsmodule/cpucap/cpu_intel.c:extern uint32_t OPENSSL_ia32cap_P[4];
./crypto/fipsmodule/cpucap/cpu_intel.c:  OPENSSL_ia32cap_P[0] = edx;
./crypto/fipsmodule/cpucap/cpu_intel.c:  OPENSSL_ia32cap_P[1] = ecx;
./crypto/fipsmodule/cpucap/cpu_intel.c:  OPENSSL_ia32cap_P[2] = extended_features[0];
./crypto/fipsmodule/cpucap/cpu_intel.c:  OPENSSL_ia32cap_P[3] = extended_features[1];
./crypto/fipsmodule/cpucap/cpu_intel.c:  // The first value determines OPENSSL_ia32cap_P[0] and [1]. The second [2]
./crypto/fipsmodule/cpucap/cpu_intel.c:  handle_cpu_env(&OPENSSL_ia32cap_P[0], env1);
./crypto/fipsmodule/cpucap/cpu_intel.c:    handle_cpu_env(&OPENSSL_ia32cap_P[2], env2 + 1);

@torben-hansen torben-hansen changed the title Use function reference to get cpu capability vector at the C-level for FIPS static build Use function reference to get cpu capability vector at the C-level Mar 16, 2023
@torben-hansen
Copy link
Contributor Author

Missed some:
grep --exclude=*.S --exclude=*.pl -r "OPENSSL_ia32cap_P" ./crypto 8:35
./crypto/fipsmodule/cpucap/cpucap.c:HIDDEN uint32_t OPENSSL_ia32cap_P[4] = {0};
./crypto/fipsmodule/cpucap/internal.h:// OPENSSL_ia32cap_P contains the Intel CPUID bits when running on an x86 or
./crypto/fipsmodule/cpucap/internal.h:extern uint32_t OPENSSL_ia32cap_P[4];
./crypto/fipsmodule/cpucap/internal.h: return OPENSSL_ia32cap_P;
./crypto/fipsmodule/cpucap/cpu_intel.c:extern uint32_t OPENSSL_ia32cap_P[4];
./crypto/fipsmodule/cpucap/cpu_intel.c: OPENSSL_ia32cap_P[0] = edx;
./crypto/fipsmodule/cpucap/cpu_intel.c: OPENSSL_ia32cap_P[1] = ecx;
./crypto/fipsmodule/cpucap/cpu_intel.c: OPENSSL_ia32cap_P[2] = extended_features[0];
./crypto/fipsmodule/cpucap/cpu_intel.c: OPENSSL_ia32cap_P[3] = extended_features[1];
./crypto/fipsmodule/cpucap/cpu_intel.c: // The first value determines OPENSSL_ia32cap_P[0] and [1]. The second [2]
./crypto/fipsmodule/cpucap/cpu_intel.c: handle_cpu_env(&OPENSSL_ia32cap_P[0], env1);
./crypto/fipsmodule/cpucap/cpu_intel.c: handle_cpu_env(&OPENSSL_ia32cap_P[2], env2 + 1);

Don't replace these:

  • OPENSSL_ia32cap_get() returns a constant pointer...
  • This code is only called once, so cost is suuuuper negligible.

@torben-hansen torben-hansen enabled auto-merge (squash) March 16, 2023 16:36
@torben-hansen torben-hansen merged commit b00c3bd into aws:main Mar 16, 2023
torben-hansen added a commit that referenced this pull request Apr 4, 2023
…que symbol instead of a common offset symbol avoiding add instruction (#862)

See #856 for background and description of issue resolved in this PR.

This is the second PR out of two PRs that bridge the performance gap.

See ticket for performance comparisons.

The first PR took care of the C-level. But the machine-optimised algorithm implementation sometimes directly dereference OPENSSL_ia32cap_P. These also need to be fixed-up. As before, we can't just add a call instruction to OPENSSL_ia32cap_get because it would compromise soundness/correctness.

Recall, that the issue was the add instruction. Instead of injecting that instruction, we instead do the following for each occurrence of OPENSSL_ia32cap_P discovered in the textual assembly:

* Define a new unique symbol. Uniquness is ensured using both the original (not true in all cases, but see "call outs") register name and a unique global counter. Bit redundant using both, but makes it easier to read IMO.
* Under the new unique symbol, copy the address of OPENSSL_ia32cap_P into the original register. This will probably spawn a relocation, but this is fine, because we put this new unique symbol outside the FIPS module scope in .text. Putting it here also means that we know the address relative to RIP.
* To jump back to the original execution point, we inject a "return" symbol/label where OPENSSL_ia32cap_P was discovered.
WillChilds-Klein pushed a commit to WillChilds-Klein/aws-lc that referenced this pull request Apr 4, 2023
…que symbol instead of a common offset symbol avoiding add instruction (aws#862)

See aws#856 for background and description of issue resolved in this PR.

This is the second PR out of two PRs that bridge the performance gap.

See ticket for performance comparisons.

The first PR took care of the C-level. But the machine-optimised algorithm implementation sometimes directly dereference OPENSSL_ia32cap_P. These also need to be fixed-up. As before, we can't just add a call instruction to OPENSSL_ia32cap_get because it would compromise soundness/correctness.

Recall, that the issue was the add instruction. Instead of injecting that instruction, we instead do the following for each occurrence of OPENSSL_ia32cap_P discovered in the textual assembly:

* Define a new unique symbol. Uniquness is ensured using both the original (not true in all cases, but see "call outs") register name and a unique global counter. Bit redundant using both, but makes it easier to read IMO.
* Under the new unique symbol, copy the address of OPENSSL_ia32cap_P into the original register. This will probably spawn a relocation, but this is fine, because we put this new unique symbol outside the FIPS module scope in .text. Putting it here also means that we know the address relative to RIP.
* To jump back to the original execution point, we inject a "return" symbol/label where OPENSSL_ia32cap_P was discovered.
samuel40791765 pushed a commit to samuel40791765/aws-lc that referenced this pull request Apr 20, 2023
…ws#856)

This is the first PR out of two PRs that bridge the performance gap.

This PR use a function reference instead of directly dereferences the global variable at the C-level. Some instances of dereferencing the variable directly were added in https://github.com/aws/aws-lc/pull/330/files, removing the function reference. Others seem to just have been dereferencing the global variable always. Make this consistent and only use the functional reference at the C-level. This avoids the push/pop of the carry-flags. And performance benchmark shows that this recovers some of the performance gap.
samuel40791765 pushed a commit to samuel40791765/aws-lc that referenced this pull request Apr 20, 2023
…que symbol instead of a common offset symbol avoiding add instruction (aws#862)

See aws#856 for background and description of issue resolved in this PR.

This is the second PR out of two PRs that bridge the performance gap.

See ticket for performance comparisons.

The first PR took care of the C-level. But the machine-optimised algorithm implementation sometimes directly dereference OPENSSL_ia32cap_P. These also need to be fixed-up. As before, we can't just add a call instruction to OPENSSL_ia32cap_get because it would compromise soundness/correctness.

Recall, that the issue was the add instruction. Instead of injecting that instruction, we instead do the following for each occurrence of OPENSSL_ia32cap_P discovered in the textual assembly:

* Define a new unique symbol. Uniquness is ensured using both the original (not true in all cases, but see "call outs") register name and a unique global counter. Bit redundant using both, but makes it easier to read IMO.
* Under the new unique symbol, copy the address of OPENSSL_ia32cap_P into the original register. This will probably spawn a relocation, but this is fine, because we put this new unique symbol outside the FIPS module scope in .text. Putting it here also means that we know the address relative to RIP.
* To jump back to the original execution point, we inject a "return" symbol/label where OPENSSL_ia32cap_P was discovered.
samuel40791765 added a commit that referenced this pull request Jun 8, 2023
* AWS-LC does not provide Time Stamp Authority functionality, define OPENSSL_NO_TS to notify code compiling with AWS-LC it is not provided (#864)

* fix dockerfile and add CI support for CentOS (#860)

* Get rid of time_t usage internally, change to int64_t

We still keep time_t stuff around for calling time() and
for external interfaces that are meant to give you time_t
values, but we stop using time_t internally. For publicly
exposed and used inputs that rely on time_t, _posix versions are
added to support providing times as an int64_t, and internal
use is changed to use the _posix version.

Several legacy functions which are extensivly used and
and use pointers to time_t are retained for compatibility,
along with posix time versions of them which we use exclusively.

This fixes the tests which were disabled on 32 bit platorms
to always run.

Update-Note: This is a potentially breaking change for things
that bind to the ASN1_[UTC|GENERALIZED]TIME_set and ASN1_TIME_adj
family of functions (and can not type convert a time_t to an
int64).

Bug: 416

Change-Id: Ic4daba5a299d8f35191853742640750a1ecc53d6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/54765
Commit-Queue: Bob Beck <[email protected]>
Reviewed-by: David Benjamin <[email protected]>
(cherry picked from commit 6e20b77e6b79069e2468686bdc69169d3fa2252e)

* Rename bssl to awslc, add version cmd (#865)

* AWS-LC does not support the OpenSSL memory debug APIs, define OPENSSL_NO_CRYPTO_MDEBUG to signal this to our customers (#868)

* Avoid potential for buffer overflow in SHA3 ARMv8 assembly (#863)

* Fix leak on invalid input to a2i_GENERAL_NAME.

Also add some tests for this syntax. The error-handling here is slightly
subtle. Although we do call GENERAL_NAME_free on the temporary
GENERAL_NAME on error, GENERAL_NAME's value is freed based on the
type field. That means if you add an object to the value but don't set
the type, it won't be freed.

Only the OTHERNAME codepath was affected by this, and a malloc
failure-only case in the is_string path. I've gone ahead and reworked
all the paths so setting the type happens at the same time as setting
the value, so this invariant is more locally obvious.

This only impacts the unsafe, stringly-typed extensions-building APIs
that no one should be using anyway.

Bug: oss-fuzz:55569
Change-Id: I6390e4ac1142264cdc86f95fd850f1b8f81e3fc9
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56725
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
(cherry picked from commit 07d353680fa7d96e77ba93382deddd030793def4)

* Make X509V3_get_value_int free the old value before overwriting it.

This is an unexported API, so it's okay to change it. Many extension
types work by parsing a list of key:value pairs and then setting fields
based on it. If a key appears twice, it'll just overwrite the old value.

But X509V3_get_value_int forgot to free the old value when doing so.

Bug: oss-fuzz:55572
Change-Id: I2b39aa7e9214e82fb40ee2e3481697338fe88e1a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56745
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
(cherry picked from commit 62ab404cb560a6886196fe65cd3381f2ae3166ca)

* Remove the last of the broken NEON workaround

All evidence we have points to these devices no longer existing (or at
least no longer taking updates) for years.

I've kept CRYPTO_has_broken_NEON around for now as there are some older
copies of the Chromium measurement code around, but now the function
always returns zero.

Change-Id: Ib76b68e347749d03611d00caecb6b8b1fdbb37b1
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56765
Reviewed-by: Adam Langley <[email protected]>
Commit-Queue: Adam Langley <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
(cherry picked from commit 2c12ebdf3a97a03b8bab7f4cd3b841926227310f)

* Correctly handle optional ASN1_ITEM_TEMPLATE types.

I didn't quite handle this case correctly in
https://boringssl-review.googlesource.com/c/boringssl/+/49350, which
made it impossible to express an OPTIONAL, doubly-tagged type in
crypto/asn1.

For some background, an ASN1_ITEM is a top-level type, while an
ASN1_TEMPLATE is roughly a field in a SEQUENCE or SET. In ASN.1, types
cannot be OPTIONAL or DEFAULT, only fields, so something like
ASN1_TFLG_OPTIONAL is a flag an ASN1_TEMPLATE.

However, there are many other type-level features that are applied as
ASN1_TEMPLATE flags. SEQUENCE OF T and SET OF T are represented as an
ASN1_TEMPLATE with the ASN1_TFLG_SEQUENCE_OF or ASN1_TFLG_SET_OF flag
and an item of T. Tagging is also a feature of ASN1_TEMPLATE.

But some top-level ASN.1 types may be SEQUENCE OF T or be tagged. So one
of the types of ASN1_ITEM is ASN1_ITEM_TEMPLATE, which is an ASN1_ITEM
that wraps an ASN1_TEMPLATE (which, in turn, wraps an ASN1_ITEM...).
Such an ASN1_ITEM could then be placed in a SEQUENCE or SET, where it is
OPTIONAL.

We didn't correctly handle this case and instead lost the optional bit.
Fix this and add a test. This is a little interesting because it means
asn1_template_ex_i2d may get an optional bit from the caller, or it may
get one from the template itself. (But it will never get both. An
ASN1_ITEM_TEMPLATE cannot wrap an optional template because types are
not optional.)

This case doesn't actually come up, given it doesn't work today. But in
my pending rewrite of tasn_enc.c, it made more sense to just make it
work, so this CL fixes it and adds a test ahead of time.

Bug: 548
Change-Id: I0cf8c25386ddff992bafae029a5a60d026f124d0
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56185
Commit-Queue: Bob Beck <[email protected]>
Reviewed-by: Bob Beck <[email protected]>
(cherry picked from commit 1df70cea5daa391e10f5df9057c60fd740b912ab)

* Add locale independent implementations of isalpha, isalnum, isdigit,
and isxdigit.

All of these can be affected by locale, and although we weren't using
them directly (except for isxdigit) we instead had manual versions inline
everywhere.

While I am here add OPENSSL_fromxdigit and deduplicate a bunch of code
in hex decoders pulling out a hex value.

Change-Id: Ie75a4fba0f043208c50b0bb14174516462c89673
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56648
Reviewed-by: David Benjamin <[email protected]>
Commit-Queue: Bob Beck <[email protected]>
(cherry picked from commit 00c70b8d698650e5836049def714b92d622bc4a6)

* Align header guard style in the remaining headers.

Change-Id: I811884dacf14fb6da4dd2300f27c8801145fd3ae
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56645
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Bob Beck <[email protected]>
(cherry picked from commit a1c79226137e8f60ed572dabdd2435f1f942be0f)

* Cap bit indices in the unsafe string-based X.509 extensions API

Without a limit, a short input can translate into a very large allocation,
which is upsetting the fuzzers. Set a limit of 256, which allows up to a
32-byte allocation. (The highest bit index of any type in RFC 5280 is
8, so this is plenty of buffer.)

We do not consider this function to be safe with untrusted inputs (even
without bugs, it is prone to string injection vulnerabilities), so DoS
is not truly a concern, but the limit is necessary to keep fuzzing
effective.

Update-Note: If anyone is using FORMAT:BITLIST to create very large BIT
STRINGs, this will break. This is unlikely and should be caught by
unit tests; if a project hits this outside of tests, that means they are
passing untrusted input into this function, which is a security
vulnerability in itself, and means they especially need this change to
avoid a DoS.

Bug: oss-fuzz:55603
Change-Id: Ie9ec0d35c7d67a568371dfa961867bf1404f7e2f
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56785
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: Bob Beck <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
(cherry picked from commit 50de086abd0f23b58320d6aa310bacdd48e80e53)

* Fix leak on error in v2i_POLICY_MAPPINGS

If obj2 were invalid, obj1 leaks. Also both leak if creating the
POLICY_MAPPINGS object fails on allocation error. Just swap the order,
so the ASN1_OBJECTs go to an owned pointer from the start.

Bug: oss-fuzz:55636
Change-Id: Ibf0bf58f44db510623035004f6eb1e00961a5454
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56805
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: Bob Beck <[email protected]>
Reviewed-by: Adam Langley <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
Commit-Queue: Adam Langley <[email protected]>
(cherry picked from commit 3c7053975b35a631f477f42f07502003d35aa2ff)

* Fix some clang-format formatting.

I forgot to put ASN1_CHOICE_END_cb in the StatementMacros list, which
caused it to mangle the formatting a bit. Also remove the duplicate
ASN1_SEQUENCE_END.

Change-Id: I58b6c6f028b81fb717722e02260f3dfaa4d17e4b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56665
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
(cherry picked from commit 210674b62a804e9b30c53df3be020d86f8ce3b55)

* Fix leak in error-handling for issuingDistributionPoint

Handling of duplicate keys is all over the place. For set_reasons, it
tried to catch it but leaked memory. Also fix a hypothetical memory leak
in crldp_from_section, but I think it's actually impossible because any
list of CONF_VALUE from a section, rather than from X509V3_parse_list,
cannot have duplicates. It just overrides the previous value.

(Ideally we'd be consistent about whether duplicates override previous
values or are caught, but I'm opting to just leave the existing behavior
alone because no one should be using these APIs in the first place.)

Bug: oss-fuzz:55669
Change-Id: I95d23c257203dcd799d19f334ef847a97d060aad
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56865
Reviewed-by: Bob Beck <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
(cherry picked from commit eb0b7e4df6eb5a082c2b977784f4270b55c58361)

* Bump release version to v1.5.1 (#871)

Co-authored-by: dkostic <[email protected]>

* Update the CI docker images documentation (#876)

Co-authored-by: dkostic <[email protected]>

* Use function reference to get cpu capability vector at the C-level (#856)

This is the first PR out of two PRs that bridge the performance gap.

This PR use a function reference instead of directly dereferences the global variable at the C-level. Some instances of dereferencing the variable directly were added in https://github.com/aws/aws-lc/pull/330/files, removing the function reference. Others seem to just have been dereferencing the global variable always. Make this consistent and only use the functional reference at the C-level. This avoids the push/pop of the carry-flags. And performance benchmark shows that this recovers some of the performance gap.

* Update analytics script to handle new tool name from #865 (#873)

* add bss segment and separate const segment from data segment (#880)

* Add basic SM2 point operations

The same trio as for the NIST curves: a point doubling function, point
addition function and point mixed addition function, all using
Jacobian coordinates, and all with input nondegeneracy assumed (see
the formal spec for the exact assumptions).

s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/1cdd6ff1a7cd77b6bc73f690ee89db41bcd787ed

* Add basic SM2 point operations

The same trio as for the NIST curves: a point doubling function, point
addition function and point mixed addition function, all using
Jacobian coordinates, and all with input nondegeneracy assumed (see
the formal spec for the exact assumptions).

s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/1cdd6ff1a7cd77b6bc73f690ee89db41bcd787ed

* Add checks to ensure that OPENSSL_ia32cap is not in the fips module (#879)

* update readme to address changes to the windows FIPS logic (#887)

* Remove docker images used for Rust crate gen (#888)

* Stringify enum to pretty-print errors (#890)

The enum type instructionType doesn't have an unwrap. I believe. In fact, not entirely sure how Errorf translates %w and it's argument. Instead, just stringify the enum type instructionType and pass that up the stack to pretty-print.

* clang-format a subset of files (#892)

* Fix minor documentation formating issues and build the documentation in the CI (#896)

* Add byte-level interfaces for X25519 functions

These provide alternative interfaces at the C level, with "_byte" in
their names to distinguish them, treating the arguments as arrays
of bytes (uint8_t) rather than of 64-bit words (uint64_t). This
better reflects how the X25519 function is generally specified and
used.

  void curve25519_x25519_byte(uint8_t res[static 32],uint8_t scalar[static 32],uint8_t point[static 32]);
  void curve25519_x25519_byte_alt(uint8_t res[static 32],uint8_t scalar[static 32],uint8_t point[static 32]);
  void curve25519_x25519base_byte(uint8_t res[static 32],uint8_t scalar[static 32]);
  void curve25519_x25519base_byte_alt(uint8_t res[static 32],uint8_t scalar[static 32]);

The underlying code is exactly the same in the x86 case, since
the platform is guaranteed to be little-endian, and the proofs
just rephrase the same results in terms of byte arrays.

The ARM functions are actually different code, using byte-level
loads and stores (ldrb, strb) at the beginning and end, and so
their proofs are also slightly different.

s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/6cdfdde71663913f2b505d287cad66cf7346c0f2

* Add byte-level interfaces for X25519 functions

These provide alternative interfaces at the C level, with "_byte" in
their names to distinguish them, treating the arguments as arrays
of bytes (uint8_t) rather than of 64-bit words (uint64_t). This
better reflects how the X25519 function is generally specified and
used.

  void curve25519_x25519_byte(uint8_t res[static 32],uint8_t scalar[static 32],uint8_t point[static 32]);
  void curve25519_x25519_byte_alt(uint8_t res[static 32],uint8_t scalar[static 32],uint8_t point[static 32]);
  void curve25519_x25519base_byte(uint8_t res[static 32],uint8_t scalar[static 32]);
  void curve25519_x25519base_byte_alt(uint8_t res[static 32],uint8_t scalar[static 32]);

The underlying code is exactly the same in the x86 case, since
the platform is guaranteed to be little-endian, and the proofs
just rephrase the same results in terms of byte arrays.

The ARM functions are actually different code, using byte-level
loads and stores (ldrb, strb) at the beginning and end, and so
their proofs are also slightly different.

s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/6cdfdde71663913f2b505d287cad66cf7346c0f2

* revert back to using bssl as tool name (#895)

* Fix leak in set_dist_point_name error handling.

The temporary X509_NAME wasn't destroyed if the section didn't exist.
Also document the weird 0 vs -1 convention (see callers), and revise the
NULL check added in
https://boringssl-review.googlesource.com/c/boringssl/+/56705. It
doesn't make a difference, but we should only apply the NULL check after
we've looked at the name, and return -1 because, after the name is
checked, it's a known syntax error.

Also fix a couple of comments that were wrong. It's that the RDNSequence
we take from X509_NAME must have one RDN, not that there's one
RDNSequence. (This is a consequence of X509_NAME's somewhat odd
in-memory representation.)

Bug: oss-fuzz:55700
Change-Id: I5745752bfa82802d361803868f962b2b0fa4bd32
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56929
Auto-Submit: David Benjamin <[email protected]>
Commit-Queue: Bob Beck <[email protected]>
Reviewed-by: Bob Beck <[email protected]>
(cherry picked from commit a028a5e01f2cd627e31f3d3dbdd8fe1f707734b4)

* Const-correct the various EVP_PKEY PEM writers

Change-Id: I6fa17e204cb2003a6803e01604c0187420b4e39b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56945
Auto-Submit: David Benjamin <[email protected]>
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: Bob Beck <[email protected]>
(cherry picked from commit db98becc488393f735790ada8b1214cb4b8c58a5)

* Reject even moduli in RSA_check_key.

RSA state management is generally a mess right now, which causes thread
contention issues in highly threaded servers. We need to do a lot
of work within the library to fix it, but in the end state,
RSA_check_key (called by the parser), BN_MONT_CTX_set_locked, and
freeze_private_key should all be unified.

This means that anything which can causes the latter two steps to fail
will be lifted up into the parser, currently RSA_check_key. We've
broadly done that, but odd moduli (n, p, and q) are currently not
covered by RSA_check_key. Fix that. We only need to check for odd n,
because odd p and q are then implied by p * q == n.

Update-Note: RSA keys with even moduli already do not work. (In addition
to being nonsensical, all operations will fail with them because we
cannot do Montgomery reduction on even moduli.) This CL shifts the error
from when you use the key, to when you parse the key, like our other
validation steps. Also after this lands, the check for odd modulus in
cl/447099278 can be removed.

Bug: 316
Change-Id: Ifa4af610316a8f717a026128078a5d38d046bff9
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56885
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: Bob Beck <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
(cherry picked from commit 29564f2b633b1275e3e97703d86b41296211fb79)

* Rearrange bn/generic.c

In preparation for adding aarch64 bn_add_words and bn_sub_words
implementations, rearrange this so we first define BN_ADD_ASM and
BN_MUL_ASM defines, and then gate fallbacks on that. This also required
moving some functions around to group the add/mul functions together.

Change-Id: I59281706db35ad3fb1186a4afd345a820f5542d2
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56965
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: Bob Beck <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
(cherry picked from commit 3a16df9aa055b8e330bc1fa2e09e0be8ee404a94)

* Add bn_add_words and bn_sub_words assembly for aarch64.

It is 2023 and compilers *still* cannot use carry flags effectively,
particularly GCC.

There are some Clang-specific built-ins which help x86_64 (where we have
asm anyway) but, on aarch64, the built-ins actually *regress
performance* over the current formulation! I suspect Clang is getting
confused by Arm and Intel having opposite borrow flags.
https://clang.llvm.org/docs/LanguageExtensions.html#multiprecision-arithmetic-builtins

Just include aarch64 assembly to avoid this. This provides a noticeable
perf boost in code that uses these functions (Where bn_mul_mont is
available, they're not used much in RSA, but the generic EC
implementation does modular additions, and RSA private key checking
spends a lot of time in our add/sub-based bn_div_consttime.)

The new code is also smaller than the generic one (18 instructions
each), probably because it avoids all the flag spills and only tries to
unroll by two iterations.

Before:
Did 7137 RSA 2048 signing operations in 4022094us (1774.4 ops/sec)
Did 326000 RSA 2048 verify (same key) operations in 4001828us (81462.8 ops/sec)
Did 278000 RSA 2048 verify (fresh key) operations in 4001392us (69475.8 ops/sec)
Did 34830 RSA 2048 private key parse operations in 4038893us (8623.7 ops/sec)
Did 1196 RSA 4096 signing operations in 4015759us (297.8 ops/sec)
Did 90000 RSA 4096 verify (same key) operations in 4041959us (22266.4 ops/sec)
Did 79000 RSA 4096 verify (fresh key) operations in 4034561us (19580.8 ops/sec)
Did 12222 RSA 4096 private key parse operations in 4004831us (3051.8 ops/sec)
Did 10626 ECDSA P-384 signing operations in 4030764us (2636.2 ops/sec)
Did 10800 ECDSA P-384 verify operations in 4052718us (2664.9 ops/sec)
Did 4182 ECDSA P-521 signing operations in 4076198us (1026.0 ops/sec)
Did 4059 ECDSA P-521 verify operations in 4063819us (998.8 ops/sec)

After:
Did 7189 RSA 2048 signing operations in 4021331us (1787.7 ops/sec) [+0.7%]
Did 326000 RSA 2048 verify (same key) operations in 4010811us (81280.3 ops/sec) [-0.2%]
Did 278000 RSA 2048 verify (fresh key) operations in 4004206us (69427.0 ops/sec) [-0.1%]
Did 53040 RSA 2048 private key parse operations in 4050953us (13093.2 ops/sec) [+51.8%]
Did 1200 RSA 4096 signing operations in 4035548us (297.4 ops/sec) [-0.2%]
Did 90000 RSA 4096 verify (same key) operations in 4035686us (22301.0 ops/sec) [+0.2%]
Did 80000 RSA 4096 verify (fresh key) operations in 4020989us (19895.6 ops/sec) [+1.6%]
Did 20468 RSA 4096 private key parse operations in 4037474us (5069.5 ops/sec) [+66.1%]
Did 11070 ECDSA P-384 signing operations in 4023595us (2751.3 ops/sec) [+4.4%]
Did 11232 ECDSA P-384 verify operations in 4063116us (2764.4 ops/sec) [+3.7%]
Did 4387 ECDSA P-521 signing operations in 4052728us (1082.5 ops/sec) [+5.5%]
Did 4305 ECDSA P-521 verify operations in 4064660us (1059.1 ops/sec) [+6.0%]

Change-Id: If2f739373cdd10fa1d4925d5e2725e87d2255fc0
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56966
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
(cherry picked from commit d1b451676eada2f2dcad9a20debf8b76fa17f403)

* Fix various malloc failure paths.

Caught by running malloc failure tests on unit tests.

Bug: 563
Change-Id: Ic0167ef346a282dc8b5a26a1cedafced7fef9ed0
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56927
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Bob Beck <[email protected]>
(cherry picked from commit f7d37fba96e5640186b31ccb834bde98102d6ac7)

* Limit the CMake -isysroot assembly workaround to older CMake

It was fixed in CMake 3.19 with
https://gitlab.kitware.com/cmake/cmake/-/issues/20771

Change-Id: Ia76ab6690e233bc650e11a79db381c00f21c83a1
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56568
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Bob Beck <[email protected]>
(cherry picked from commit 61266e464b9b509a8a0943b9cc826c97c31e04e7)

* Remove old clang-cl workaround

Looks like this has since been fixed (or isn't hitting GTest anymore for
some reason).

Bug: chromium:772117
Change-Id: I2c2fb694e4429281e20fd252ef9c2c34e29a425c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56570
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
(cherry picked from commit 60d61196e43cfcea45936de667f98f5d6a6fa684)

* Also test i2d_GENERAL_NAME in X509Test.GeneralName

GENERAL_NAME uses a weird ASN1_SEQUENCE item type. Test that serializing
it works.

Change-Id: I8d44eb637f58a9fbe870b1998b0d75e2bfcde601
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56986
Auto-Submit: David Benjamin <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
Commit-Queue: Bob Beck <[email protected]>
Reviewed-by: Bob Beck <[email protected]>
(cherry picked from commit e3912cdf9b5095f8ecdf1c6390b79ebe5cf48f31)

* Unexport GENERAL_NAME_cmp

This function was involved in both CVE-2020-1971 and CVE-2023-0286. Both
times, we've had to confirm there were no external callers. Unexport it
so we can be sure of this.

Change-Id: I37b756f5bd66e389f03540872371001c85a0b5af
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56987
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
(cherry picked from commit d3d7d36151ce966ed132bfcb9e108fffa6d70535)

* Remove if'd-out OCB-AES assembly

BoringSSL never shipped the OCB-AES assembly, but took two different
strategies in disabling it for x86 versus x86_64. For x86, the
implementation was deleted, but for x86_64 it was wrapped in `if(0)`.

Since we're no longer as concerned about keeping the assembly from
diverging from upstream, be consistent in how the OCB-AES functions
are removed from both by deleting them from x86_64.

Change-Id: I5233134e3e131fed56f365ed6f43f30c39dd2e33
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56989
Reviewed-by: David Benjamin <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
(cherry picked from commit 70e415d6b836c3c98b207a4c050c99c7971a1930)

* Remove stale TODO in util/bot/DEPS

We are using the CIPD copy of CMake on Windows now.

Change-Id: Idb9f62876c69333f9504540bad8321a173eaec3e
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56988
Commit-Queue: Bob Beck <[email protected]>
Reviewed-by: Bob Beck <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
(cherry picked from commit 49d7b2d6a4d586e2debfa847e81b5ad9a9d3218a)

* Update build files in generated-src

* Added RSA-3072 sign/verify to the speed tool (#877)

* Add missing APIs to better support Bind (#904)

* Add missing APIs to better support Bind:
* BN_GENCB_set_old sets old style callbacks on BN_GENCB
* DH_clear_flags which is a no op
* Define but don't support DH_FLAG_CACHE_MONT_P, AWS-LC always uses montgomery multiplication for DH

* Disable 3DES by default (#906)

Disable 3DES from default cipher suite list. The TLS version specific keyword rules still add 3DES when appropriate (and that is TLS 1 and TLS 1.1).

Some keyword rules map to the default rule ALL. Instead of hard-coding these into the cipher suite mapping structure, we factor them out and just keep the default ALL rule in the structure. Future keywords that should map to default can then be added to a separate list.

* Bump release version to v1.6.0 (#909)

Prepare new release by bumping release version number string to v1.6.0.

* Enable Valgrind testing for libssl (#912)

* Enable valgrind testing for ssl/ssl_test
* Add valgrind target for ssl/test/runner
* Increase valgrind instance size

* Add SHA3_Squeeze output buffer overflow test (#893)

* Add SHA3_Squeeze output buffer overflow test

* Wrap SHA3 ARMv8 ASM optimizations in C function

* Enable SHA3, deprecate its opt-in functions (#898)

For the deprecated opt-in functions, we modify the getter to always
retrun true and modify the setter to be a no-op as SHA3 is now always
enabled.

* Zeroize data immediately after use for FIPS (#911)

* Merge branch 's2n-bignum-2023-03-31' into aws-lc-s2n-bignum-update-2023-03-31 (#916)

* Add basic NIST P-384 point operations

A point doubling function, point addition function, and point mixed
addition function for the P-384 curve, all using Jacobian coordinates
in a Montgomery representation, with input nondegeneracy assumed.
Once again, the addition and mixed addition functions offer only
marginal efficiency gains over just calling a sequence of basic field
operations, but the doubling has some beneficial mathematically
equivalent short-cutting of the intermediate modular reductions.

s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/df8e913c542e5392a9f9cb6cd42fc90c5a02f72e

* Tweak ARM bignum_sqr_p521_alt to use fewer registers

s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/57a43a3c6f4d29c822b1c226557ced539be575ef

* Add basic NIST P-521 point operations

The same trio of a point doubling function, point addition function
and point mixed addition function, this time for the P-521 curve,
all using Jacobian coordinates, with input nondegeneracy assumed.

s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/047c0b1401610f9933a60ce0836143f9217ffa34

* update license headers of .c and .s files

s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/f6670af4f5dc4df40d334a79f39a4cbf727b8510

* update license headers of Makefiles and sed files

s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/5746a969ad2cff94b6172151375fc4319375879f

* Add SM2 mapping to Montgomery representation

s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/0579f2c5d9c0fe75951d3c0f4e45d3c0b8709bbb

* Add SM2 field negation

And tweak the ARM implementations of analogous functions for
P-256 and P-384 to avoid a couple of instructions by using
immediates directly instead of loading constants.

s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/ed5fdd3c8822cd593248d38cea038f71c89fd5b6

* Add SM2 field doubling and halving

And again, make minor tweaks to the ARM implementations of some analogous
functions for P-256 and P-384.

s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/0ac9eea30a37c3fef7505647105428cc3bff1185

* Add basic SM2 point operations

The same trio as for the NIST curves: a point doubling function, point
addition function and point mixed addition function, all using
Jacobian coordinates, and all with input nondegeneracy assumed (see
the formal spec for the exact assumptions).

s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/1cdd6ff1a7cd77b6bc73f690ee89db41bcd787ed

---------

Co-authored-by: John Harrison <[email protected]>
Co-authored-by: jargh <[email protected]>
Co-authored-by: sachiang <[email protected]>
Co-authored-by: Samuel Chiang <[email protected]>

* Remove remaining branches in bn/generic.c (#919)

This is a follow up to cc3e7ce which skipped some of these changes. I verified with this change bn/generic.c matches the upstream file before their latest commit.

* Build flag AWS_LC_IGNORE_BN_SET_FLAGS (#918)

* Revert "Merge branch 's2n-bignum-2023-03-31' into aws-lc-s2n-bignum-update-2023-03-31 (#916)"

This reverts commit 9c005182944d3fb3731eb8a5748b95807ad1a6d4.

* Clean up test_support_lib and GTest dependencies slightly.

test_support_lib depends on GTest and should be marked as such.
Historically it was a bit fuzzy, but now it's unambiguous.

With that cleaned up, we can remove one of the global
include_directories calls and rely on CMake's
INTERFACE_INCLUDE_DIRECTORIES machinery.

(CMake's documentation and "modern CMake" prefers setting include
directories on the target and letting them flow up the dependency tree,
rather than configuring it globally across the project.)

Change-Id: I364df834d62328b69f146fbe35c10af97618a713
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56567
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: David Benjamin <[email protected]>

* Remove global_target from build.

This was added with the generated symbol-prefixing header. But it
seems to be sufficient for crypto to have a dependency on the
generated header, along with some of the stray bits of delocate.

It's a little unclear from CMake documentation how these are processed;
normally .o files can be built before libraries are built or linked,
only the link step depends on.

But, empirically, if A links B, and B has a dependency on C, then CMake
seems to run C before building any of A. I tested this by making a small
project where the generation step slept for three seconds and running
with enough parallelism that we'd have tripped.

Interestingly, in the Makefile output, the individual object file
targets didn't have that dependency, but the target itself did. But this
was true on both A and B, so I think that just might not work.

Also fix the dependency in the custom target. The old formulation broke
when using an absolute path to the symbols file.

Change-Id: I2053d44949f907d465da403a5ec69c191740268f
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56928
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Bob Beck <[email protected]>

* Make boringssl_gtest_main a STATIC library

Prior to 3.12 (which we won't be requiring until July), OBJECT libraries
cannot be used with target_link_libraries. That means they cannot pick
up INTERFACE_INCLUDE_DIRECTORIES, which makes them pretty unusable in
the "modern CMake" style.

Just switch it to a static library to unbreak the build in CMake 3.10.

For some link ordering reason I don't understand, this also requires
explicitly linking boringssl_gtest to libcxx when we USE_CUSTOM_LIBCXX
is set.

Change-Id: Ia9d8351551f5da060248aa3ca73fe04473bf62aa
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57345
Commit-Queue: Bob Beck <[email protected]>
Reviewed-by: Bob Beck <[email protected]>
Auto-Submit: David Benjamin <[email protected]>

* Specify -Iinclude with the crypto target.

It's unclear to me whether doing it target-by-target is an improvement
in crypto/fipsmodule, but this otherwise does seem a bit tidier. This
aligns with CMake's documentation and "modern CMake" which prefers this
pattern.

Change-Id: I36c81842bff8b36eeaaf5dd3e0695fb45f3376c9
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56585
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Bob Beck <[email protected]>

* Reference OPENSSL_ia32cap_P at Intel machine code level through a unique symbol instead of a common offset symbol avoiding add instruction (#862)

See #856 for background and description of issue resolved in this PR.

This is the second PR out of two PRs that bridge the performance gap.

See ticket for performance comparisons.

The first PR took care of the C-level. But the machine-optimised algorithm implementation sometimes directly dereference OPENSSL_ia32cap_P. These also need to be fixed-up. As before, we can't just add a call instruction to OPENSSL_ia32cap_get because it would compromise soundness/correctness.

Recall, that the issue was the add instruction. Instead of injecting that instruction, we instead do the following for each occurrence of OPENSSL_ia32cap_P discovered in the textual assembly:

* Define a new unique symbol. Uniquness is ensured using both the original (not true in all cases, but see "call outs") register name and a unique global counter. Bit redundant using both, but makes it easier to read IMO.
* Under the new unique symbol, copy the address of OPENSSL_ia32cap_P into the original register. This will probably spawn a relocation, but this is fine, because we put this new unique symbol outside the FIPS module scope in .text. Putting it here also means that we know the address relative to RIP.
* To jump back to the original execution point, we inject a "return" symbol/label where OPENSSL_ia32cap_P was discovered.

* AES-GCM enabled with AVX512 vAES and vPCLMULQDQ. (#692)

Add AES-GCM implementation utilizing the enhanced crypto ISA's
AVX512 vAES and vPCLMULQDQ. Performance numbers measured on an
EC2 m6i instance with Intel(R) Xeon(R) Platinum 8375C CPU:

                Operation                |    patched    |   baseline     |
  EVP-AES-128-GCM Encrypt     (16 bytes) |   4704346.1	 |   4861893.0	  |  0.97
  EVP-AES-128-GCM Decrypt     (16 bytes) |   4882642.6	 |   4813297.3	  |  1.01
  EVP-AES-128-GCM Encrypt    (256 bytes) |   4078340.9	 |   3987586.5	  |  1.02
  EVP-AES-128-GCM Decrypt    (256 bytes) |   4328520.6	 |   3928698.9	  |  1.10
  EVP-AES-128-GCM Encrypt   (1350 bytes) |   2715983.7	 |   2292394.5	  |  1.18
  EVP-AES-128-GCM Decrypt   (1350 bytes) |   2700846.1	 |   2359556.4	  |  1.14
  EVP-AES-128-GCM Encrypt   (8192 bytes) |   1191100.7	 |    679803.5	  |  1.75
  EVP-AES-128-GCM Decrypt   (8192 bytes) |   1202249.8	 |    685041.6	  |  1.76
  EVP-AES-128-GCM Encrypt  (16384 bytes) |    700707.1	 |    367700.7	  |  1.91
  EVP-AES-128-GCM Decrypt  (16384 bytes) |    705060.9	 |    370331.2	  |  1.90
  EVP-AES-192-GCM Encrypt     (16 bytes) |   4680506.6	 |   4806730.8	  |  0.97
  EVP-AES-192-GCM Decrypt     (16 bytes) |   4863172.2	 |   4744569.7	  |  1.02
  EVP-AES-192-GCM Encrypt    (256 bytes) |   4041011.6	 |   3912140.5	  |  1.03
  EVP-AES-192-GCM Decrypt    (256 bytes) |   4307267.4	 |   3851845.9	  |  1.12
  EVP-AES-192-GCM Encrypt   (1350 bytes) |   2653241.2	 |   2203656.2	  |  1.20
  EVP-AES-192-GCM Decrypt   (1350 bytes) |   2651021.8	 |   2264766.7	  |  1.17
  EVP-AES-192-GCM Encrypt   (8192 bytes) |   1131315.6	 |    632555.3	  |  1.79
  EVP-AES-192-GCM Decrypt   (8192 bytes) |   1146644.5	 |    636672.8	  |  1.80
  EVP-AES-192-GCM Encrypt  (16384 bytes) |    661890.1	 |    340302.0	  |  1.95
  EVP-AES-192-GCM Decrypt  (16384 bytes) |    667961.9	 |    342540.3	  |  1.95
  EVP-AES-256-GCM Encrypt     (16 bytes) |   4360655.5	 |   4448514.2	  |  0.98
  EVP-AES-256-GCM Decrypt     (16 bytes) |   4524082.6	 |   4417907.2	  |  1.02
  EVP-AES-256-GCM Encrypt    (256 bytes) |   3786200.8	 |   3653974.4	  |  1.04
  EVP-AES-256-GCM Decrypt    (256 bytes) |   4041289.9	 |   3588325.4	  |  1.13
  EVP-AES-256-GCM Encrypt   (1350 bytes) |   2492820.5	 |   2039554.2	  |  1.22
  EVP-AES-256-GCM Decrypt   (1350 bytes) |   2491387.1	 |   2093344.8	  |  1.19
  EVP-AES-256-GCM Encrypt   (8192 bytes) |   1074842.0	 |    583915.9	  |  1.84
  EVP-AES-256-GCM Decrypt   (8192 bytes) |   1081206.4	 |    587709.1	  |  1.84
  EVP-AES-256-GCM Encrypt  (16384 bytes) |    630155.0	 |    313262.3	  |  2.01
  EVP-AES-256-GCM Decrypt  (16384 bytes) |    628752.9	 |    315892.9	  |  1.99
 AEAD-AES-128-GCM    seal     (16 bytes) |  10859554.5	 |  10813717.6	  |  1.00
 AEAD-AES-128-GCM    seal    (256 bytes) |   7911436.7	 |   7305720.8	  |  1.08
 AEAD-AES-128-GCM    seal   (1350 bytes) |   4000694.0	 |   3118647.6	  |  1.28
 AEAD-AES-128-GCM    seal   (8192 bytes) |   1393107.0	 |    737327.6	  |  1.89
 AEAD-AES-128-GCM    seal  (16384 bytes) |    771988.4	 |    384239.6	  |  2.01
 AEAD-AES-256-GCM    seal     (16 bytes) |  10330064.1	 |  10192387.9	  |  1.01
 AEAD-AES-256-GCM    seal    (256 bytes) |   7489985.0	 |   6831952.2	  |  1.10
 AEAD-AES-256-GCM    seal   (1350 bytes) |   3660191.4	 |   2764809.2	  |  1.32
 AEAD-AES-256-GCM    seal   (8192 bytes) |   1244452.4	 |    630821.5	  |  1.97
 AEAD-AES-256-GCM    seal  (16384 bytes) |    686051.9	 |    327090.4	  |  2.10

* Update pbkdf2 service indicator to require lower bound of 1000 iterations (#924)

* TLS 1.3 Transfer Support (#891)

* Bump AWS-LC API version (#900)

* Factor out C-level implementation of curve25519 arithmetic and algorithm functions (#922)

Moves curve25519 C-level implementation to its own compilation unit

Note: this does not completely move ed25519. Waiting until the public interface from s2n-bignum is ready.

* Don't use negative values for unimplemented modes

Our EVP_CIPHER_mode returns an unsigned value and including negative
numbers in switch/case when the value is unsigned causes some warnings.
This should avoid the need for https://github.com/nodejs/node/pull/46564

(Having them be positive shouldn't have compat impacts. CCM is 8, but no
cipher will report CCM, so any path checking for it will just be dead
code.)

Change-Id: I8dcf5ea55fad9732a09d6da73114cde5d69397d3
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57025
Reviewed-by: Bob Beck <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
(cherry picked from commit d9ea5553c3c9af6460257b035e9ebfbffbc78a1d)

* Move Go CMake support into its own file.

Slowly reduce clutter in the top-level CMake file.

Change-Id: Ib7ca2aee7337db82ed1989c56bbaaf6ee5da0768
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56569
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
(cherry picked from commit 261ec612e21b81a4c16bbda615d0850556483b4f)

* VMS? I don't think so. Take this for a walk behind the barn.

Change-Id: Ia7518c6eeb87f21bbcb88d3b688745d07e963662
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57085
Reviewed-by: David Benjamin <[email protected]>
Commit-Queue: Bob Beck <[email protected]>
(cherry picked from commit 8846d7b3c35745f5ecc053f650dd76d3750d7ce5)

* Bound the overall output size of ASN1_generate_v3

The output of ASN1_generate_v3 is *mostly* linear with the input, except
SEQ and SET reference config sections. Sections can be referenced
multiple times, and so the structure grows exponentially.

Cap the total output size to mitigate this. As before, we don't consider
these functions safe to use with untrusted inputs, but unbounded growth
will frustrate fuzzing. This CL sets the limit to 64K, which should be
enough for anyone. (This is the size of a single X.509 extension,
whereas certificates themselves should not get that large.)

While not strictly necessary, this also rearranges the
ASN1_mbstring_copy call to pass in a maximum output. This portion does
scale linearly with the output, so it's fine, but the fuzzer discovered
an input with a 700K-byte input, which, with fuzzer instrumentation and
sanitizers, seems to be a bit slow. This change should help the fuzzer
get past those cases faster.

Update-Note: The stringly-typed API for constructing X.509 extensions
now has a maximum output size. If anyone was constructing an extension
larger than 64K, this will break. This is unlikely and should be caught
by unit tests; if a project hits this outside of tests, that means they
are passing untrusted input into this function, which is a security
vulnerability in itself, and means they especially need this change to
avoid a DoS.

Bug: oss-fuzz:55725
Change-Id: Ibb65854293f44bf48ed5855016ef7cd46d2fae77
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57125
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
(cherry picked from commit 9580424ca8579317d0ccf1d8db5e58539f239a20)

* Add a note in INCORPORATING about which branch to use

Especially when they were named "2214" instead of "chromium-2214", I've
seen papers and other projects treat them as releases. Add a note to
make it clear they are not releases.

AWS-LC:
- Update example branch names to match AWS-LC's branches

Change-Id: Ie820b800de3d25a31d3083d4ceff75e1d7f74a06
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57145
Reviewed-by: Bob Beck <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
Commit-Queue: Bob Beck <[email protected]>
(cherry picked from commit ace33161544814ed6dc9e9d17cfde0422881b9d2)

* Add OPENSSL_asprintf and friends for asprintf(3) functionality.

This includes an internal version which allows a flag to specify
the use of system malloc, or OPENSSL_malloc - this in turn allows
us to use this function in the ERR family of functions and allow
for ERR to not call OPENSSL_malloc with a circular dependency.

Bug: 564

Change-Id: Ifd02d062fda9695cddbb0dbef2e1c1db0802a486
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57005
Auto-Submit: Bob Beck <[email protected]>
Reviewed-by: David Benjamin <[email protected]>
Commit-Queue: Bob Beck <[email protected]>
(cherry picked from commit 350f8547cf2101669684ebdb99b49b11fff5e217)

* Make ERR and thread use system malloc.

This will let us call ERR and thread_local from OPENSSL_malloc
without creating a circular dependency. We also make
ERR_get_error_line_data add ERR_FLAG_MALLOCED to the returned
flags value, since some projects appear to be making
assumptions about it being there.

Bug: 564

Update-Note: Any recent documentation (in all OpenSSL forks) for the ERR functions
cautions against freeing the returned ERR "data" strings, as freeing them is handled
by the error library. This change can make an existing double free bug more
obvious by being more likely to cause a crash with the double free.

Change-Id: Ie30bd3aee0b506473988b90675c48510969db31a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57045
Reviewed-by: David Benjamin <[email protected]>
Commit-Queue: Bob Beck <[email protected]>
Auto-Submit: Bob Beck <[email protected]>
(cherry picked from commit fc524c161e8640e017b0d838f76e75dc49181e34)

* Cap decimal input sizes in s2i_ASN1_INTEGER

Decoding from decimal takes quadratic time, and BN_dec2bn will happily
decode however large of input you pass in. This frustrates fuzzers.

I've added a cap to the input length in s2i_ASN1_INTEGER for now, rather
than BN_dec2bn, because we've seen people use BN for surprisingly large
calculator operations, and BN generally doesn't cap inputs to quadratic
(or worse) algorithms beyond memory limits. (We generally rely on
cryptography using fixed parameter sizes, though RSA, DSA, and DH were
misstandardized and need ad-hoc limits everywhere.)

Update-Note: The stringly-typed API for constructing X.509 extensions
now has (very generous) maximum input length for decimal integers of
8,192 digits. If anyone was relying on a higher input, this will break.
This is unlikely and should be caught by unit tests; if a project hits
this outside of tests, that means they are passing untrusted input into
this function, which is a security vulnerability in itself, and means
they especially need this change to avoid a DoS.

Bug: chromium:1415108
Change-Id: I138249d23ca6b1996f8437dba98633349bb3042b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57205
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Bob Beck <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
(cherry picked from commit d5e93f521b3fd4f57049583a1584d285e5aab16c)

* Make OPENSSL_malloc push ERR_R_MALLOC_FAILURE on failure.

Remove all the other ERR_R_MALLOC_FAILURES from the
codebase.

Also changes cbb to push to the error stack, to correctly
report cbb failures instead of now only reporting
malloc failures. Previously it turned all cbb failures
into a malloc failure

Bug: 564

Change-Id: Ic13208bf9d9aaa470e83b2f15782fc94946bbc7b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57046
Auto-Submit: Bob Beck <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: David Benjamin <[email protected]>
(cherry picked from commit dcabfe2d8940529a69e007660fa7bf6c15954ecc)

* Remove remaining ERR_R_MALLOC_FAILURE calls in code added by AWS-LC

* Fix a -Wignored-qualifiers warning in trust_token_test.cc

The const bool doesn't do anything. While I'm here, make the methods
const.

Change-Id: Id8c31d5fcda6d8bc244c64b02b1d758e4eff6849
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57185
Auto-Submit: David Benjamin <[email protected]>
Commit-Queue: Steven Valdez <[email protected]>
Reviewed-by: Steven Valdez <[email protected]>
(cherry picked from commit ec64d7e01a7ca30957c8bce38f6ad989e0b8ced1)

* Align the hash-to-curve formulation with draft-16.

draft-07 to draft-16 is mostly editorial, but there were a few notable
changes:

- Empty DST values are forbidden.

- The sample implementation for map_to_curve_simple_swu has completely
  changed. The new formulation has the same performance (if not a smidge
  faster), and aligning with the spec seems generally useful.

- P-384 is now paired with SHA-384, not SHA-512. As this would be a
  breaking change for the trust tokens code, I've left that in. A
  follow-up CL will add implementations of draft-16, which is expected
  to match the final draft.

Before:
Did 77000 hash-to-curve P384_XMD:SHA-512_SSWU_RO_ operations in 4025677us (19127.2 ops/sec)
Did 7156000 hash-to-scalar P384_XMD:SHA-512 operations in 4000385us (1788827.8 ops/sec)

After:
Did 77000 hash-to-curve P384_XMD:SHA-512_SSWU_RO_ operations in 4009708us (19203.4 ops/sec) [+0.4%]
Did 7327000 hash-to-scalar P384_XMD:SHA-512 operations in 4000477us (1831531.6 ops/sec) [+2.4%]

Bug: 1414562
Change-Id: Ic3c37061e325250d5d8723fd9aa263930c6023cf
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57146
Auto-Submit: David Benjamin <[email protected]>
Reviewed-by: Steven Valdez <[email protected]>
Commit-Queue: Steven Valdez <[email protected]>
(cherry picked from commit 9c9b2c219fc817940cb31cc0d055c61c5986e058)

* Implement P256_XMD:SHA-256_SSWU_RO_ and P384_XMD:SHA-384_SSWU_RO_

Also add public APIs for this, now that the specification is no longer
expected to change, and because a project external to the library wishes
to use it.

For now, I've kept the P-256 version using the generic felem_exp, but we
should update that to use the specialized field arithmetic.

Trust Tokens will presumably move to this later and, in the meantime,
another team wants this.

Bug: chromium:1414562
Change-Id: Ie38203b4439ff55659c4fb2070f45d524c55aa2a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57147
Commit-Queue: David Benjamin <[email protected]>
Reviewed-by: Steven Valdez <[email protected]>
(cherry picked from commit 3950d6ce25c263c3d131985edfcd6b0899a7949e)

* Unify the two copies of bn_add_words and bn_sub_words

Compilers are fine at inlining functions nowadays. We can hide the
BN_ULLONG vs. manual carry extraction inside an inline function. I've
patterned the type signatures intentionally after Clang's builtins, in
case we want to use them in the future.

(Previously I wrote in
https://boringssl-review.googlesource.com/c/boringssl/+/56966 that the
builtins weren't good on aarch64. This wasn't quite right. Rather, they
were bad on both x86_64 and aarch64 in LLVM 13, but they're fine on both
in LLVM 14. My machine's Xcode was just a little old.)

Change-Id: I666466dce7a146d5e49e94ff372ea018b610ef34
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57245
Commit-Queue: David Benjamin <[email protected]>
Auto-Submit: David Benjamin <[email protected]>
Reviewed-by: Bob Beck <[email protected]>
(cherry picked from commit de12e3cabcb466cdf96c90d9bff0d919af71d561)

* Ignore cmake files in the licensce check

* Actually disable Go in the ancient CI. This previously worked because we never checked if go was found or not, and in the case of disabling testing and libssl nothing attempted to use Go. cmake/go.cmake now has more sanity checks and detects that the docker container doesn't have Go installed.

* TLS 1.3 Transfer Fuzzing (#929)

* Release v1.7.0 (#933)

* Update size of BORINGSSL_function_hit to account for new aes_gcm_encrypt_avx512 (#934)

* Add missing boringssl_prefix_symbols target (#931)

* Add back ASN1_STRING_clear_free that was accidentally removed (#936)

Co-authored-by: dkostic <[email protected]>

* Re-factor machine-level optimisation decision logic for x25519 (#932)

After factoring out the C-implementation in c5e2fb8, now we factor out the control flow decision logic that selects the backend algorithm. This makes it simpler to add new backends.

* Improved support for OpenSSH (#894)

* Release v1.8.0 (#944)

* Delete leftover code from previous Kyber implementation (#943)

Co-authored-by: dkostic <[email protected]>

* Remove Gitter from README.md (#947)

* Per file namespace for symbolic labels

s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/da2b90c7fc45b77c639528cea7898575c73f6f39

* Provide a compile option to disable AVX512-specific optimizations. (#945)

- The new flag name ends with "512AVX" instead of "AVX512" so that it doesn't include the entire flag
-DMY_ASSEMBLER_IS_TOO_OLD_FOR_AVX and match it in the Perl files checks.

- avx512 is disabled when avx is.

* AES-GCM optimised for AArch64 with EOR3 instruction and a wide AES/PMULL pipeline - 8x unrolling (#721)

This is an integration of Armv8 optimised implementation of AES-GCM which employs stitching (interleaving) of the AES instructions with the GHASH calculation, which operates on 8 input blocks in parallel.
The code was first contributed to OpenSSL in https://github.com/openssl/openssl/commit/954f45ba4c50457
[PerformanceImprovement.md](https://github.com/aws/aws-lc/files/10900781/PerformanceImprovement.md)
0206ff5bed811e512cf92dc8e by [email protected] in Jan, 2022.
It was cherry-picked into 3.1.0 by the end of 2022 in https://github.com/openssl/openssl/commit/34ca334e5de6837f2c6bc0b0b0df28bdd237e4d7.

The code is targeting the Arm Neoverse V1 and Apple M1's architectures since it relies on 2 HW capabilities: SHA3 extension, particularly the 3-way XOR instruction, EOR3, and a wider ASIMD pipeline which processes the AES and PMULL instructions, the latter being the one used in the GHASH calculation.

### Call-outs:
Using ARMV8_NEOVERSE_V1 and ARMV8_APPLE_M1 as bits in OPENSSL_armcap_P is different from the original code:
* It tests for specific micro-architectures known to have a wide crypto, i.e. AES/PMULL pipeline.
* It replaces the bit ARMV8_AES_GCM_UNROLL8 which designated both a wide pipeline and the availability of SHA3 extension.

### Performance Improvement:
Numbers are in **MB/s**
#### EVP API
| Algorithm | Operation | Bytes | AWS-LC PR#721 | AWS-LC main | ratio to main |
| --- | --- | --- | --- | --- | --- |
| EVP-AES-128-GCM | Encrypt | 16    | 90.7   | 99.5   | 0.91 |
|                 |         | 128   | 648.6  | 692.4  | 0.94 |
|                 |         | 192   | 883.3  | 962.5  | 0.92 |
|                 |         | 256   | 1176.1 | 1197.9 | 0.98 |
|                 |         | 512   | 1996.3 | 1892.7 | 1.05 |
|                 |         | 1350  | 3376.7 | 2876.8 | 1.17 |
|                 |         | 8192  | 5810.9 | 4008.2 | 1.45 |
|                 |         | 16384 | 6209.3 | 4163.2 | 1.49 |
|                 | Decrypt | 16    | 85.5   | 92.6   | 0.92 |
|                 |         | 128   | 615.1  | 670.4  | 0.92 |
|                 |         | 192   | 869    | 940.3  | 0.92 |
|                 |         | 256   | 1125.3 | 1177.7 | 0.96 |
|                 |         | 512   | 1927.2 | 1895.5 | 1.02 |
|                 |         | 1350  | 3251.8 | 2962.6 | 1.10 |
|                 |         | 8192  | 5851.4 | 4436.9 | 1.32 |
|                 |         | 16384 | 6271.6 | 4652.7 | 1.35 |
| EVP-AES-192-GCM | Encrypt | 16    | 87.6   | 96.2   | 0.91 |
|                 |         | 128   | 628.4  | 681.4  | 0.92 |
|                 |         | 192   | 866.3  | 941.4  | 0.92 |
|                 |         | 256   | 1146.2 | 1168.9 | 0.98 |
|                 |         | 512   | 1924.7 | 1836.4 | 1.05 |
|                 |         | 1350  | 3213.4 | 2765   | 1.16 |
|                 |         | 8192  | 5480   | 3913.5 | 1.40 |
|                 |         | 16384 | 5838.7 | 4068.2 | 1.44 |
|                 | Decrypt | 16    | 82.4   | 90.9   | 0.91 |
|                 |         | 128   | 594.9  | 656.5  | 0.91 |
|                 |         | 192   | 838.1  | 919.2  | 0.91 |
|                 |         | 256   | 1084.2 | 1147.2 | 0.95 |
|                 |         | 512   | 1844.5 | 1831.6 | 1.01 |
|                 |         | 1350  | 3094   | 2817.1 | 1.10 |
|                 |         | 8192  | 5396   | 4154.3 | 1.30 |
|                 |         | 16384 | 5765.9 | 4341.8 | 1.33 |
| EVP-AES-256-GCM | Encrypt | 16    | 85.6   | 93.9   | 0.91 |
|                 |         | 128   | 616.8  | 666.3  | 0.93 |
|                 |         | 192   | 846.4  | 917.2  | 0.92 |
|                 |         | 256   | 1105.9 | 1137.5 | 0.97 |
|                 |         | 512   | 1849.1 | 1773.6 | 1.04 |
|                 |         | 1350  | 3033.4 | 2632.9 | 1.15 |
|                 |         | 8192  | 5034.9 | 3707.4 | 1.36 |
|                 |         | 16384 | 5341.9 | 3845.9 | 1.39 |
|                 | Decrypt | 16    | 81.3   | 89.2   | 0.91 |
|                 |         | 128   | 583.2  | 641.8  | 0.91 |
|                 |         | 192   | 817.8  | 894.9  | 0.91 |
|                 |         | 256   | 1071   | 1114.1 | 0.96 |
|                 |         | 512   | 1800   | 1760.9 | 1.02 |
|                 |         | 1350  | 2969.6 | 2662.8 | 1.12 |
|                 |         | 8192  | 5110.4 | 3865   | 1.32 |
|                 |         | 16384 | 5449   | 4024.8 | 1.35 |

#### AEAD API
| Algorithm | Operation | Bytes | AWS-LC PR#721 | AWS-LC main | ratio to main |
| --- | --- | --- | --- | --- | --- |
| AEAD-AES-128-GCM | Seal (Encrypt) | 16    | 177.4  | 179.2  | 0.99 |
|                  |                | 128   | 1157.4 | 1163.8 | 0.99 |
|                  |                | 192   | 1543   | 1549.5 | 1.00 |
|                  |                | 256   | 1916.3 | 1851   | 1.04 |
|                  |                | 512   | 2973.1 | 2630.4 | 1.13 |
|                  |                | 1350  | 4236.1 | 3353.7 | 1.26 |
|                  |                | 8192  | 6208.4 | 4205.1 | 1.48 |
|                  |                | 16384 | 6444.2 | 4290.5 | 1.50 |
|                  | Open (Decrypt) | 16    | 159.8  | 161.8  | 0.99 |
|                  |                | 128   | 1068.3 | 1080   | 0.99 |
|                  |                | 192   | 1454.3 | 1480   | 0.98 |
|                  |                | 256   | 1790.4 | 1783.7 | 1.00 |
|                  |                | 512   | 2826.8 | 2613.6 | 1.08 |
|                  |                | 1350  | 4068.2 | 3435.3 | 1.18 |
|                  |                | 8192  | 6223.2 | 4626   | 1.35 |
|                  |                | 16384 | 6487.6 | 4754.3 | 1.36 |
| AEAD-AES-128-GCM | Seal (Encrypt) | 16    | 171.1  | 170.6  | 1.00 |
|                  |                | 128   | 1093.8 | 1099   | 1.00 |
|                  |                | 192   | 1445.2 | 1453.3 | 0.99 |
|                  |                | 256   | 1779.9 | 1726.7 | 1.03 |
|                  |                | 512   | 2710.3 | 2419.8 | 1.12 |
|                  |                | 1350  | 3744   | 3041.5 | 1.23 |
|                  |                | 8192  | 5330.6 | 3842.6 | 1.39 |
|                  |                | 16384 | 5508.6 | 3916.1 | 1.41 |
|                  | Open (Decrypt) | 16    | 155.9  | 158.8  | 0.98 |
|                  |                | 128   | 1013.2 | 1024   | 0.99 |
|                  |                | 192   | 1361.8 | 1387.1 | 0.98 |
|                  |                | 256   | 1686.5 | 1658.2 | 1.02 |
|                  |                | 512   | 2607.3 | 2376.9 | 1.10 |
|                  |                | 1350  | 3682.2 | 3045.4 | 1.21 |
|                  |                | 8192  | 5418.3 | 4005.8 | 1.35 |
|                  |                | 16384 | 5620.3 | 4099.1 | 1.37 |

### M1 (AEAD API)
 #### main
```
Did 52951000 AEAD-AES-128-GCM seal (16 bytes) operations in 3000008us (17650286.3 ops/sec): 282.4 MB/s
Did 43846250 AEAD-AES-128-GCM seal (128 bytes) operations in 3000009us (14615372.8 ops/sec): 1870.8 MB/s
Did 38685500 AEAD-AES-128-GCM seal (192 bytes) operations in 3000010us (12895123.7 ops/sec): 2475.9 MB/s
Did 34504000 AEAD-AES-128-GCM seal (256 bytes) operations in 3000077us (11501038.1 ops/sec): 2944.3 MB/s
Did 24288500 AEAD-AES-128-GCM seal (512 bytes) operations in 3000031us (8096083.0 ops/sec): 4145.2 MB/s
Did 11798750 AEAD-AES-128-GCM seal (1350 bytes) operations in 3000032us (3932874.7 ops/sec): 5309.4 MB/s
Did 2490000 AEAD-AES-128-GCM seal (8192 bytes) operations in 3000049us (829986.4 ops/sec): 6799.2 MB/s
Did 1278000 AEAD-AES-128-GCM seal (16384 bytes) operations in 3001880us (425733.2 ops/sec): 6975.2 MB/s
Did 49507000 AEAD-AES-128-GCM open (16 bytes) operations in 3000052us (16502047.3 ops/sec): 264.0 MB/s
Did 41929000 AEAD-AES-128-GCM open (128 bytes) operations in 3000009us (13976291.4 ops/sec): 1789.0 MB/s
Did 36945750 AEAD-AES-128-GCM open (192 bytes) operations in 3000012us (12315200.7 ops/sec): 2364.5 MB/s
Did 33254000 AEAD-AES-128-GCM open (256 bytes) operations in 3000013us (11084618.6 ops/sec): 2837.7 MB/s
Did 23444000 AEAD-AES-128-GCM open (512 bytes) operations in 3000046us (7814546.8 ops/sec): 4001.0 MB/s
Did 11298750 AEAD-AES-128-GCM open (1350 bytes) operations in 3000044us (3766194.8 ops/sec): 5084.4 MB/s
Did 2397000 AEAD-AES-128-GCM open (8192 bytes) operations in 3000713us (798810.1 ops/sec): 6543.9 MB/s
Did 1196000 AEAD-AES-128-GCM open (16384 bytes) operations in 3000559us (398592.4 ops/sec): 6530.5 MB/s
Did 50220000 AEAD-AES-256-GCM seal (16 bytes) operations in 3000014us (16739921.9 ops/sec): 267.8 MB/s
Did 40900750 AEAD-AES-256-GCM seal (128 bytes) operations in 3000004us (13633565.2 ops/sec): 1745.1 MB/s
Did 35152000 AEAD-AES-256-GCM seal (192 bytes) operations in 3000002us (11717325.5 ops/sec): 2249.7 MB/s
Did 31452750 AEAD-AES-256-GCM seal (256 bytes) operations in 3000020us (10484180.1 ops/sec): 2684.0 MB/s
Did 21425750 AEAD-AES-256-GCM seal (512 bytes) operations in 3000027us (7141852.4 ops/sec): 3656.6 MB/s
Did 10136750 AEAD-AES-256-GCM seal (1350 bytes) operations in 3000018us (3378896.4 ops/sec): 4561.5 MB/s
Did 2091000 AEAD-AES-256-GCM seal (8192 bytes) operations in 3000749us (696826.0 ops/sec): 5708.4 MB/s
Did 1066000 AEAD-AES-256-GCM seal (16384 bytes) operations in 3001132us (355199.3 ops/sec): 5819.6 MB/s
Did 49377250 AEAD-AES-256-GCM open (16 bytes) operations in 3000010us (16459028.5 ops/sec): 263.3 MB/s
Did 40825000 AEAD-AES-256-GCM open (128 bytes) operations in 3000007us (13608301.6 ops/sec): 1741.9 MB/s
Did 35528000 AEAD-AES-256-GCM open (192 bytes) operations in 3000002us (11842658.8 ops/sec): 2273.8 MB/s
Did 31562000 AEAD-AES-256-GCM open (256 bytes) operations in 3000014us (10520617.6 ops/sec): 2693.3 MB/s
Did 21607000 AEAD-AES-256-GCM open (512 bytes) operations in 3000001us (7202330.9 ops/sec): 3687.6 MB/s
Did 10316750 AEAD-AES-256-GCM open (1350 bytes) operations in 3000034us (3438877.7 ops/sec): 4642.5 MB/s
Did 2154000 AEAD-AES-256-GCM open (8192 bytes) operations in 3000688us (717835.4 ops/sec): 5880.5 MB/s
Did 1085000 AEAD-AES-256-GCM open (16384 bytes) operations in 3000182us (361644.7 ops/sec): 5925.2 MB/s
```

#### PR
```
Did 52991000 AEAD-AES-128-GCM seal (16 bytes) operations in 3000014us (17663584.2 ops/sec): 282.6 MB/s
Did 43686750 AEAD-AES-128-GCM seal (128 bytes) operations in 3000017us (14562167.5 ops/sec): 1864.0 MB/s
Did 38589000 AEAD-AES-128-GCM seal (192 bytes) operations in 3000010us (12862957.1 ops/sec): 2469.7 MB/s
Did 35045750 AEAD-AES-128-GCM seal (256 bytes) operations in 3000008us (11681885.5 ops/sec): 2990.6 MB/s
Did 26081750 AEAD-AES-128-GCM seal (512 bytes) operations in 3000015us (8693873.2 ops/sec): 4451.3 MB/s
Did 13458500 AEAD-AES-128-GCM seal (1350 bytes) operations in 3000033us (4486117.3 ops/sec): 6056.3 MB/s
Did 3016000 AEAD-AES-128-GCM seal (8192 bytes) operations in 3000802us (1005064.6 ops/sec): 8233.5 MB/s
Did 1519000 AEAD-AES-128-GCM seal (16384 bytes) operations in 3000267us (506288.3 ops/sec): 8295.0 MB/s
Did 49665750 AEAD-AES-128-GCM open (16 bytes) operations in 3000016us (16555161.7 ops/sec): 264.9 MB/s
Did 42062500 AEAD-AES-128-GCM open (128 bytes) operations in 3000017us (14020753.9 ops/sec): 1794.7 MB/s
Did 37012250 AEAD-AES-128-GCM open (192 bytes) operations in 3000018us (12337342.6 ops/sec): 2368.8 MB/s
Did 34332500 AEAD-AES-128-GCM open (256 bytes) operations in 3000005us (11444147.6 ops/sec): 2929.7 MB/s
Did 25532750 AEAD-AES-128-GCM open (512 bytes) operations in 3000006us (8510899.6 ops/sec): 4357.6 MB/s
Did 13174750 AEAD-AES-128-GCM open (1350 bytes) operations in 3000052us (4391507.2 ops/sec): 5928.5 MB/s
Did 3023000 AEAD-AES-128-GCM open (8192 bytes) operations in 3000759us (1007411.8 ops/sec): 8252.7 MB/s
Did 1557000 AEAD-AES-128-GCM open (16384 bytes) operations in 3000047us (518991.9 ops/sec): 8503.2 MB/s
Did 49483500 AEAD-AES-256-GCM seal (16 bytes) operations in 3000001us (16494494.5 ops/sec): 263.9 MB/s
Did 41014000 AEAD-AES-256-GCM seal (128 bytes) operations in 3000016us (13671260.4 ops/sec): 1749.9 MB/s
Did 35160000 AEAD-AES-256-GCM seal (192 bytes) operations in 3000014us (11719945.3 ops/sec): 2250.2 MB/s
Did 32640750 AEAD-AES-256-GCM seal (256 bytes) operations in 3000004us (10880235.5 ops/sec): 2785.3 MB/s
Did 23427000 AEAD-AES-256-GCM seal (512 bytes) operations in 3000030us (7808921.9 ops/sec): 3998.2 MB/s
Did 11443000 AEAD-AES-256-GCM seal (1350 bytes) operations in 3000029us (3814296.5 ops/sec): 5149.3 MB/s
Did 2513000 AEAD-AES-256-GCM…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants