Use function reference to get cpu capability vector at the C-level #856

torben-hansen · 2023-03-08T17:59:00Z

Issues:

CryptoAlg-1577

Description of changes:

The goal of this PR is to close a performance gap between FIPS static and FIPS shared build. The latter is more performant in some cases.

Background

OPENSSL_ia32cap_P is a global, initialised, mutable variable. And hence, will typically go to .bss. However, the location of the .bss segment is not known before program load-time. Hence, dereferences of OPENSSL_ia32cap_P in code must be filled in by the loader. This modifies .text before it's mapped into memory, in turn, would make any hash over (part of) .text unpredictable. This is bad for FIPS, which requires a deterministic hash over part of .text to implement the integrity check.

All of this is already known though and there already exist an implement solution! We are only concerned about the static FIPS build. The shared build uses a different technique that doesn't require in-line modifications of textural assembly code, and therefore is not subject to the same performance degradation explained below.

The FIPS static build fix-up dereferences of OPENSSL_ia32cap_P in a pretty straightforward way:

First add a "trampoline" label OPENSSL_ia32cap_addr_delta storing an offset to OPENSSL_ia32cap_P.
Replace dereferences of OPENSSL_ia32cap_P by first loading the address of OPENSSL_ia32cap_addr_delta (which is known relative to RIP, and then add the offset stored under that label.

Note, that the latter step requires an addition. The addition might modify carry-flags. This is problematic, because the in-line modification of the assembly performed by the delocator script is not really context-aware. It just simply uses the destination register. But injecting an instruction that potentially modifies the carry-flag could break soundness/correctness because the subsequent assembly code was not written with this in mind.

Therefore, the delocator script push the cpu flags onto the stack and pops them after the injected code: https://github.com/aws/aws-lc/blob/main/util/fipstools/delocate/delocate.go#L1367-L1369. This restores the carry-flag state to what it was before the injected code ensuring soundness/correctness.

Now to the actual issue: It was found that this push/pop of carry-flags introduced a small latency, that when accumulated affects performance. For example, AES-256-XTS init and encrypt (256 bytes) can perform 4706548 invocations per second for the fips static build, but 6735574 invocations per second for the fips shared build (c6i.4xlarge); in other words, fips shared build does ~43% more invocations per second than the fips static build. Performance differences can be seen for various other algorithms e.g. P384 ops, SHA-256, AEAD-AES-128-GCM, etc.

Solution

This is the first PR out of two PRs that bridge the performance gap.

This PR use a function reference instead of directly dereferences the global variable at the C-level. Some instances of dereferencing the variable directly were added in https://github.com/aws/aws-lc/pull/330/files, removing the function reference. Others seem to just have been dereferencing the global variable always. Make this consistent and only use the functional reference at the C-level. This avoids the push/pop of the carry-flags. And performance benchmark shows that this recovers some of the performance gap.

Next PR will tackle the assembly level code, which can't directly use the functional reference: again because the delocator script is not context-aware, so adding a function call, will modify state that later assembly code has not been written to account for. For example, the assembly code might use registers that the function call by definition assumes are callee owned.

See CryptoAlg-1577 for performance gain information.

Testing:

CI passes over all relevant compilers for fips build. CryptoAlg-1577 has information on benchmarks.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and
the ISC license.

…ic fips build

Taffer

Code looks good to me. 👍

torben-hansen · 2023-03-16T15:38:04Z

Missed some:

grep --exclude=\*.S --exclude=\*.pl -r "OPENSSL_ia32cap_P" ./crypto                                         8:35
./crypto/fipsmodule/cpucap/cpucap.c:HIDDEN uint32_t OPENSSL_ia32cap_P[4] = {0};
./crypto/fipsmodule/cpucap/internal.h:// OPENSSL_ia32cap_P contains the Intel CPUID bits when running on an x86 or
./crypto/fipsmodule/cpucap/internal.h:extern uint32_t OPENSSL_ia32cap_P[4];
./crypto/fipsmodule/cpucap/internal.h:  return OPENSSL_ia32cap_P;
./crypto/fipsmodule/cpucap/cpu_intel.c:extern uint32_t OPENSSL_ia32cap_P[4];
./crypto/fipsmodule/cpucap/cpu_intel.c:  OPENSSL_ia32cap_P[0] = edx;
./crypto/fipsmodule/cpucap/cpu_intel.c:  OPENSSL_ia32cap_P[1] = ecx;
./crypto/fipsmodule/cpucap/cpu_intel.c:  OPENSSL_ia32cap_P[2] = extended_features[0];
./crypto/fipsmodule/cpucap/cpu_intel.c:  OPENSSL_ia32cap_P[3] = extended_features[1];
./crypto/fipsmodule/cpucap/cpu_intel.c:  // The first value determines OPENSSL_ia32cap_P[0] and [1]. The second [2]
./crypto/fipsmodule/cpucap/cpu_intel.c:  handle_cpu_env(&OPENSSL_ia32cap_P[0], env1);
./crypto/fipsmodule/cpucap/cpu_intel.c:    handle_cpu_env(&OPENSSL_ia32cap_P[2], env2 + 1);

This reverts commit 83ae5e5.

torben-hansen · 2023-03-16T16:17:54Z

Missed some:
grep --exclude=*.S --exclude=*.pl -r "OPENSSL_ia32cap_P" ./crypto 8:35
./crypto/fipsmodule/cpucap/cpucap.c:HIDDEN uint32_t OPENSSL_ia32cap_P[4] = {0};
./crypto/fipsmodule/cpucap/internal.h:// OPENSSL_ia32cap_P contains the Intel CPUID bits when running on an x86 or
./crypto/fipsmodule/cpucap/internal.h:extern uint32_t OPENSSL_ia32cap_P[4];
./crypto/fipsmodule/cpucap/internal.h: return OPENSSL_ia32cap_P;
./crypto/fipsmodule/cpucap/cpu_intel.c:extern uint32_t OPENSSL_ia32cap_P[4];
./crypto/fipsmodule/cpucap/cpu_intel.c: OPENSSL_ia32cap_P[0] = edx;
./crypto/fipsmodule/cpucap/cpu_intel.c: OPENSSL_ia32cap_P[1] = ecx;
./crypto/fipsmodule/cpucap/cpu_intel.c: OPENSSL_ia32cap_P[2] = extended_features[0];
./crypto/fipsmodule/cpucap/cpu_intel.c: OPENSSL_ia32cap_P[3] = extended_features[1];
./crypto/fipsmodule/cpucap/cpu_intel.c: // The first value determines OPENSSL_ia32cap_P[0] and [1]. The second [2]
./crypto/fipsmodule/cpucap/cpu_intel.c: handle_cpu_env(&OPENSSL_ia32cap_P[0], env1);
./crypto/fipsmodule/cpucap/cpu_intel.c: handle_cpu_env(&OPENSSL_ia32cap_P[2], env2 + 1);

Don't replace these:

OPENSSL_ia32cap_get() returns a constant pointer...
This code is only called once, so cost is suuuuper negligible.

…que symbol instead of a common offset symbol avoiding add instruction (#862) See #856 for background and description of issue resolved in this PR. This is the second PR out of two PRs that bridge the performance gap. See ticket for performance comparisons. The first PR took care of the C-level. But the machine-optimised algorithm implementation sometimes directly dereference OPENSSL_ia32cap_P. These also need to be fixed-up. As before, we can't just add a call instruction to OPENSSL_ia32cap_get because it would compromise soundness/correctness. Recall, that the issue was the add instruction. Instead of injecting that instruction, we instead do the following for each occurrence of OPENSSL_ia32cap_P discovered in the textual assembly: * Define a new unique symbol. Uniquness is ensured using both the original (not true in all cases, but see "call outs") register name and a unique global counter. Bit redundant using both, but makes it easier to read IMO. * Under the new unique symbol, copy the address of OPENSSL_ia32cap_P into the original register. This will probably spawn a relocation, but this is fine, because we put this new unique symbol outside the FIPS module scope in .text. Putting it here also means that we know the address relative to RIP. * To jump back to the original execution point, we inject a "return" symbol/label where OPENSSL_ia32cap_P was discovered.

…que symbol instead of a common offset symbol avoiding add instruction (aws#862) See aws#856 for background and description of issue resolved in this PR. This is the second PR out of two PRs that bridge the performance gap. See ticket for performance comparisons. The first PR took care of the C-level. But the machine-optimised algorithm implementation sometimes directly dereference OPENSSL_ia32cap_P. These also need to be fixed-up. As before, we can't just add a call instruction to OPENSSL_ia32cap_get because it would compromise soundness/correctness. Recall, that the issue was the add instruction. Instead of injecting that instruction, we instead do the following for each occurrence of OPENSSL_ia32cap_P discovered in the textual assembly: * Define a new unique symbol. Uniquness is ensured using both the original (not true in all cases, but see "call outs") register name and a unique global counter. Bit redundant using both, but makes it easier to read IMO. * Under the new unique symbol, copy the address of OPENSSL_ia32cap_P into the original register. This will probably spawn a relocation, but this is fine, because we put this new unique symbol outside the FIPS module scope in .text. Putting it here also means that we know the address relative to RIP. * To jump back to the original execution point, we inject a "return" symbol/label where OPENSSL_ia32cap_P was discovered.

…ws#856) This is the first PR out of two PRs that bridge the performance gap. This PR use a function reference instead of directly dereferences the global variable at the C-level. Some instances of dereferencing the variable directly were added in https://github.com/aws/aws-lc/pull/330/files, removing the function reference. Others seem to just have been dereferencing the global variable always. Make this consistent and only use the functional reference at the C-level. This avoids the push/pop of the carry-flags. And performance benchmark shows that this recovers some of the performance gap.

…que symbol instead of a common offset symbol avoiding add instruction (aws#862) See aws#856 for background and description of issue resolved in this PR. This is the second PR out of two PRs that bridge the performance gap. See ticket for performance comparisons. The first PR took care of the C-level. But the machine-optimised algorithm implementation sometimes directly dereference OPENSSL_ia32cap_P. These also need to be fixed-up. As before, we can't just add a call instruction to OPENSSL_ia32cap_get because it would compromise soundness/correctness. Recall, that the issue was the add instruction. Instead of injecting that instruction, we instead do the following for each occurrence of OPENSSL_ia32cap_P discovered in the textual assembly: * Define a new unique symbol. Uniquness is ensured using both the original (not true in all cases, but see "call outs") register name and a unique global counter. Bit redundant using both, but makes it easier to read IMO. * Under the new unique symbol, copy the address of OPENSSL_ia32cap_P into the original register. This will probably spawn a relocation, but this is fine, because we put this new unique symbol outside the FIPS module scope in .text. Putting it here also means that we know the address relative to RIP. * To jump back to the original execution point, we inject a "return" symbol/label where OPENSSL_ia32cap_P was discovered.

* AWS-LC does not provide Time Stamp Authority functionality, define OPENSSL_NO_TS to notify code compiling with AWS-LC it is not provided (#864) * fix dockerfile and add CI support for CentOS (#860) * Get rid of time_t usage internally, change to int64_t We still keep time_t stuff around for calling time() and for external interfaces that are meant to give you time_t values, but we stop using time_t internally. For publicly exposed and used inputs that rely on time_t, _posix versions are added to support providing times as an int64_t, and internal use is changed to use the _posix version. Several legacy functions which are extensivly used and and use pointers to time_t are retained for compatibility, along with posix time versions of them which we use exclusively. This fixes the tests which were disabled on 32 bit platorms to always run. Update-Note: This is a potentially breaking change for things that bind to the ASN1_[UTC|GENERALIZED]TIME_set and ASN1_TIME_adj family of functions (and can not type convert a time_t to an int64). Bug: 416 Change-Id: Ic4daba5a299d8f35191853742640750a1ecc53d6 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/54765 Commit-Queue: Bob Beck <[email protected]> Reviewed-by: David Benjamin <[email protected]> (cherry picked from commit 6e20b77e6b79069e2468686bdc69169d3fa2252e) * Rename bssl to awslc, add version cmd (#865) * AWS-LC does not support the OpenSSL memory debug APIs, define OPENSSL_NO_CRYPTO_MDEBUG to signal this to our customers (#868) * Avoid potential for buffer overflow in SHA3 ARMv8 assembly (#863) * Fix leak on invalid input to a2i_GENERAL_NAME. Also add some tests for this syntax. The error-handling here is slightly subtle. Although we do call GENERAL_NAME_free on the temporary GENERAL_NAME on error, GENERAL_NAME's value is freed based on the type field. That means if you add an object to the value but don't set the type, it won't be freed. Only the OTHERNAME codepath was affected by this, and a malloc failure-only case in the is_string path. I've gone ahead and reworked all the paths so setting the type happens at the same time as setting the value, so this invariant is more locally obvious. This only impacts the unsafe, stringly-typed extensions-building APIs that no one should be using anyway. Bug: oss-fuzz:55569 Change-Id: I6390e4ac1142264cdc86f95fd850f1b8f81e3fc9 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56725 Reviewed-by: Adam Langley <[email protected]> Commit-Queue: David Benjamin <[email protected]> Auto-Submit: David Benjamin <[email protected]> (cherry picked from commit 07d353680fa7d96e77ba93382deddd030793def4) * Make X509V3_get_value_int free the old value before overwriting it. This is an unexported API, so it's okay to change it. Many extension types work by parsing a list of key:value pairs and then setting fields based on it. If a key appears twice, it'll just overwrite the old value. But X509V3_get_value_int forgot to free the old value when doing so. Bug: oss-fuzz:55572 Change-Id: I2b39aa7e9214e82fb40ee2e3481697338fe88e1a Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56745 Reviewed-by: Adam Langley <[email protected]> Commit-Queue: David Benjamin <[email protected]> (cherry picked from commit 62ab404cb560a6886196fe65cd3381f2ae3166ca) * Remove the last of the broken NEON workaround All evidence we have points to these devices no longer existing (or at least no longer taking updates) for years. I've kept CRYPTO_has_broken_NEON around for now as there are some older copies of the Chromium measurement code around, but now the function always returns zero. Change-Id: Ib76b68e347749d03611d00caecb6b8b1fdbb37b1 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56765 Reviewed-by: Adam Langley <[email protected]> Commit-Queue: Adam Langley <[email protected]> Auto-Submit: David Benjamin <[email protected]> (cherry picked from commit 2c12ebdf3a97a03b8bab7f4cd3b841926227310f) * Correctly handle optional ASN1_ITEM_TEMPLATE types. I didn't quite handle this case correctly in https://boringssl-review.googlesource.com/c/boringssl/+/49350, which made it impossible to express an OPTIONAL, doubly-tagged type in crypto/asn1. For some background, an ASN1_ITEM is a top-level type, while an ASN1_TEMPLATE is roughly a field in a SEQUENCE or SET. In ASN.1, types cannot be OPTIONAL or DEFAULT, only fields, so something like ASN1_TFLG_OPTIONAL is a flag an ASN1_TEMPLATE. However, there are many other type-level features that are applied as ASN1_TEMPLATE flags. SEQUENCE OF T and SET OF T are represented as an ASN1_TEMPLATE with the ASN1_TFLG_SEQUENCE_OF or ASN1_TFLG_SET_OF flag and an item of T. Tagging is also a feature of ASN1_TEMPLATE. But some top-level ASN.1 types may be SEQUENCE OF T or be tagged. So one of the types of ASN1_ITEM is ASN1_ITEM_TEMPLATE, which is an ASN1_ITEM that wraps an ASN1_TEMPLATE (which, in turn, wraps an ASN1_ITEM...). Such an ASN1_ITEM could then be placed in a SEQUENCE or SET, where it is OPTIONAL. We didn't correctly handle this case and instead lost the optional bit. Fix this and add a test. This is a little interesting because it means asn1_template_ex_i2d may get an optional bit from the caller, or it may get one from the template itself. (But it will never get both. An ASN1_ITEM_TEMPLATE cannot wrap an optional template because types are not optional.) This case doesn't actually come up, given it doesn't work today. But in my pending rewrite of tasn_enc.c, it made more sense to just make it work, so this CL fixes it and adds a test ahead of time. Bug: 548 Change-Id: I0cf8c25386ddff992bafae029a5a60d026f124d0 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56185 Commit-Queue: Bob Beck <[email protected]> Reviewed-by: Bob Beck <[email protected]> (cherry picked from commit 1df70cea5daa391e10f5df9057c60fd740b912ab) * Add locale independent implementations of isalpha, isalnum, isdigit, and isxdigit. All of these can be affected by locale, and although we weren't using them directly (except for isxdigit) we instead had manual versions inline everywhere. While I am here add OPENSSL_fromxdigit and deduplicate a bunch of code in hex decoders pulling out a hex value. Change-Id: Ie75a4fba0f043208c50b0bb14174516462c89673 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56648 Reviewed-by: David Benjamin <[email protected]> Commit-Queue: Bob Beck <[email protected]> (cherry picked from commit 00c70b8d698650e5836049def714b92d622bc4a6) * Align header guard style in the remaining headers. Change-Id: I811884dacf14fb6da4dd2300f27c8801145fd3ae Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56645 Commit-Queue: David Benjamin <[email protected]> Reviewed-by: Bob Beck <[email protected]> (cherry picked from commit a1c79226137e8f60ed572dabdd2435f1f942be0f) * Cap bit indices in the unsafe string-based X.509 extensions API Without a limit, a short input can translate into a very large allocation, which is upsetting the fuzzers. Set a limit of 256, which allows up to a 32-byte allocation. (The highest bit index of any type in RFC 5280 is 8, so this is plenty of buffer.) We do not consider this function to be safe with untrusted inputs (even without bugs, it is prone to string injection vulnerabilities), so DoS is not truly a concern, but the limit is necessary to keep fuzzing effective. Update-Note: If anyone is using FORMAT:BITLIST to create very large BIT STRINGs, this will break. This is unlikely and should be caught by unit tests; if a project hits this outside of tests, that means they are passing untrusted input into this function, which is a security vulnerability in itself, and means they especially need this change to avoid a DoS. Bug: oss-fuzz:55603 Change-Id: Ie9ec0d35c7d67a568371dfa961867bf1404f7e2f Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56785 Reviewed-by: Bob Beck <[email protected]> Commit-Queue: Bob Beck <[email protected]> Auto-Submit: David Benjamin <[email protected]> (cherry picked from commit 50de086abd0f23b58320d6aa310bacdd48e80e53) * Fix leak on error in v2i_POLICY_MAPPINGS If obj2 were invalid, obj1 leaks. Also both leak if creating the POLICY_MAPPINGS object fails on allocation error. Just swap the order, so the ASN1_OBJECTs go to an owned pointer from the start. Bug: oss-fuzz:55636 Change-Id: Ibf0bf58f44db510623035004f6eb1e00961a5454 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56805 Reviewed-by: Bob Beck <[email protected]> Commit-Queue: Bob Beck <[email protected]> Reviewed-by: Adam Langley <[email protected]> Auto-Submit: David Benjamin <[email protected]> Commit-Queue: David Benjamin <[email protected]> Commit-Queue: Adam Langley <[email protected]> (cherry picked from commit 3c7053975b35a631f477f42f07502003d35aa2ff) * Fix some clang-format formatting. I forgot to put ASN1_CHOICE_END_cb in the StatementMacros list, which caused it to mangle the formatting a bit. Also remove the duplicate ASN1_SEQUENCE_END. Change-Id: I58b6c6f028b81fb717722e02260f3dfaa4d17e4b Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56665 Reviewed-by: Bob Beck <[email protected]> Commit-Queue: David Benjamin <[email protected]> (cherry picked from commit 210674b62a804e9b30c53df3be020d86f8ce3b55) * Fix leak in error-handling for issuingDistributionPoint Handling of duplicate keys is all over the place. For set_reasons, it tried to catch it but leaked memory. Also fix a hypothetical memory leak in crldp_from_section, but I think it's actually impossible because any list of CONF_VALUE from a section, rather than from X509V3_parse_list, cannot have duplicates. It just overrides the previous value. (Ideally we'd be consistent about whether duplicates override previous values or are caught, but I'm opting to just leave the existing behavior alone because no one should be using these APIs in the first place.) Bug: oss-fuzz:55669 Change-Id: I95d23c257203dcd799d19f334ef847a97d060aad Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56865 Reviewed-by: Bob Beck <[email protected]> Auto-Submit: David Benjamin <[email protected]> Commit-Queue: David Benjamin <[email protected]> (cherry picked from commit eb0b7e4df6eb5a082c2b977784f4270b55c58361) * Bump release version to v1.5.1 (#871) Co-authored-by: dkostic <[email protected]> * Update the CI docker images documentation (#876) Co-authored-by: dkostic <[email protected]> * Use function reference to get cpu capability vector at the C-level (#856) This is the first PR out of two PRs that bridge the performance gap. This PR use a function reference instead of directly dereferences the global variable at the C-level. Some instances of dereferencing the variable directly were added in https://github.com/aws/aws-lc/pull/330/files, removing the function reference. Others seem to just have been dereferencing the global variable always. Make this consistent and only use the functional reference at the C-level. This avoids the push/pop of the carry-flags. And performance benchmark shows that this recovers some of the performance gap. * Update analytics script to handle new tool name from #865 (#873) * add bss segment and separate const segment from data segment (#880) * Add basic SM2 point operations The same trio as for the NIST curves: a point doubling function, point addition function and point mixed addition function, all using Jacobian coordinates, and all with input nondegeneracy assumed (see the formal spec for the exact assumptions). s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/1cdd6ff1a7cd77b6bc73f690ee89db41bcd787ed * Add basic SM2 point operations The same trio as for the NIST curves: a point doubling function, point addition function and point mixed addition function, all using Jacobian coordinates, and all with input nondegeneracy assumed (see the formal spec for the exact assumptions). s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/1cdd6ff1a7cd77b6bc73f690ee89db41bcd787ed * Add checks to ensure that OPENSSL_ia32cap is not in the fips module (#879) * update readme to address changes to the windows FIPS logic (#887) * Remove docker images used for Rust crate gen (#888) * Stringify enum to pretty-print errors (#890) The enum type instructionType doesn't have an unwrap. I believe. In fact, not entirely sure how Errorf translates %w and it's argument. Instead, just stringify the enum type instructionType and pass that up the stack to pretty-print. * clang-format a subset of files (#892) * Fix minor documentation formating issues and build the documentation in the CI (#896) * Add byte-level interfaces for X25519 functions These provide alternative interfaces at the C level, with "_byte" in their names to distinguish them, treating the arguments as arrays of bytes (uint8_t) rather than of 64-bit words (uint64_t). This better reflects how the X25519 function is generally specified and used. void curve25519_x25519_byte(uint8_t res[static 32],uint8_t scalar[static 32],uint8_t point[static 32]); void curve25519_x25519_byte_alt(uint8_t res[static 32],uint8_t scalar[static 32],uint8_t point[static 32]); void curve25519_x25519base_byte(uint8_t res[static 32],uint8_t scalar[static 32]); void curve25519_x25519base_byte_alt(uint8_t res[static 32],uint8_t scalar[static 32]); The underlying code is exactly the same in the x86 case, since the platform is guaranteed to be little-endian, and the proofs just rephrase the same results in terms of byte arrays. The ARM functions are actually different code, using byte-level loads and stores (ldrb, strb) at the beginning and end, and so their proofs are also slightly different. s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/6cdfdde71663913f2b505d287cad66cf7346c0f2 * Add byte-level interfaces for X25519 functions These provide alternative interfaces at the C level, with "_byte" in their names to distinguish them, treating the arguments as arrays of bytes (uint8_t) rather than of 64-bit words (uint64_t). This better reflects how the X25519 function is generally specified and used. void curve25519_x25519_byte(uint8_t res[static 32],uint8_t scalar[static 32],uint8_t point[static 32]); void curve25519_x25519_byte_alt(uint8_t res[static 32],uint8_t scalar[static 32],uint8_t point[static 32]); void curve25519_x25519base_byte(uint8_t res[static 32],uint8_t scalar[static 32]); void curve25519_x25519base_byte_alt(uint8_t res[static 32],uint8_t scalar[static 32]); The underlying code is exactly the same in the x86 case, since the platform is guaranteed to be little-endian, and the proofs just rephrase the same results in terms of byte arrays. The ARM functions are actually different code, using byte-level loads and stores (ldrb, strb) at the beginning and end, and so their proofs are also slightly different. s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/6cdfdde71663913f2b505d287cad66cf7346c0f2 * revert back to using bssl as tool name (#895) * Fix leak in set_dist_point_name error handling. The temporary X509_NAME wasn't destroyed if the section didn't exist. Also document the weird 0 vs -1 convention (see callers), and revise the NULL check added in https://boringssl-review.googlesource.com/c/boringssl/+/56705. It doesn't make a difference, but we should only apply the NULL check after we've looked at the name, and return -1 because, after the name is checked, it's a known syntax error. Also fix a couple of comments that were wrong. It's that the RDNSequence we take from X509_NAME must have one RDN, not that there's one RDNSequence. (This is a consequence of X509_NAME's somewhat odd in-memory representation.) Bug: oss-fuzz:55700 Change-Id: I5745752bfa82802d361803868f962b2b0fa4bd32 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56929 Auto-Submit: David Benjamin <[email protected]> Commit-Queue: Bob Beck <[email protected]> Reviewed-by: Bob Beck <[email protected]> (cherry picked from commit a028a5e01f2cd627e31f3d3dbdd8fe1f707734b4) * Const-correct the various EVP_PKEY PEM writers Change-Id: I6fa17e204cb2003a6803e01604c0187420b4e39b Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56945 Auto-Submit: David Benjamin <[email protected]> Reviewed-by: Bob Beck <[email protected]> Commit-Queue: Bob Beck <[email protected]> (cherry picked from commit db98becc488393f735790ada8b1214cb4b8c58a5) * Reject even moduli in RSA_check_key. RSA state management is generally a mess right now, which causes thread contention issues in highly threaded servers. We need to do a lot of work within the library to fix it, but in the end state, RSA_check_key (called by the parser), BN_MONT_CTX_set_locked, and freeze_private_key should all be unified. This means that anything which can causes the latter two steps to fail will be lifted up into the parser, currently RSA_check_key. We've broadly done that, but odd moduli (n, p, and q) are currently not covered by RSA_check_key. Fix that. We only need to check for odd n, because odd p and q are then implied by p * q == n. Update-Note: RSA keys with even moduli already do not work. (In addition to being nonsensical, all operations will fail with them because we cannot do Montgomery reduction on even moduli.) This CL shifts the error from when you use the key, to when you parse the key, like our other validation steps. Also after this lands, the check for odd modulus in cl/447099278 can be removed. Bug: 316 Change-Id: Ifa4af610316a8f717a026128078a5d38d046bff9 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56885 Reviewed-by: Bob Beck <[email protected]> Commit-Queue: Bob Beck <[email protected]> Auto-Submit: David Benjamin <[email protected]> (cherry picked from commit 29564f2b633b1275e3e97703d86b41296211fb79) * Rearrange bn/generic.c In preparation for adding aarch64 bn_add_words and bn_sub_words implementations, rearrange this so we first define BN_ADD_ASM and BN_MUL_ASM defines, and then gate fallbacks on that. This also required moving some functions around to group the add/mul functions together. Change-Id: I59281706db35ad3fb1186a4afd345a820f5542d2 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56965 Reviewed-by: Bob Beck <[email protected]> Commit-Queue: Bob Beck <[email protected]> Commit-Queue: David Benjamin <[email protected]> Auto-Submit: David Benjamin <[email protected]> (cherry picked from commit 3a16df9aa055b8e330bc1fa2e09e0be8ee404a94) * Add bn_add_words and bn_sub_words assembly for aarch64. It is 2023 and compilers *still* cannot use carry flags effectively, particularly GCC. There are some Clang-specific built-ins which help x86_64 (where we have asm anyway) but, on aarch64, the built-ins actually *regress performance* over the current formulation! I suspect Clang is getting confused by Arm and Intel having opposite borrow flags. https://clang.llvm.org/docs/LanguageExtensions.html#multiprecision-arithmetic-builtins Just include aarch64 assembly to avoid this. This provides a noticeable perf boost in code that uses these functions (Where bn_mul_mont is available, they're not used much in RSA, but the generic EC implementation does modular additions, and RSA private key checking spends a lot of time in our add/sub-based bn_div_consttime.) The new code is also smaller than the generic one (18 instructions each), probably because it avoids all the flag spills and only tries to unroll by two iterations. Before: Did 7137 RSA 2048 signing operations in 4022094us (1774.4 ops/sec) Did 326000 RSA 2048 verify (same key) operations in 4001828us (81462.8 ops/sec) Did 278000 RSA 2048 verify (fresh key) operations in 4001392us (69475.8 ops/sec) Did 34830 RSA 2048 private key parse operations in 4038893us (8623.7 ops/sec) Did 1196 RSA 4096 signing operations in 4015759us (297.8 ops/sec) Did 90000 RSA 4096 verify (same key) operations in 4041959us (22266.4 ops/sec) Did 79000 RSA 4096 verify (fresh key) operations in 4034561us (19580.8 ops/sec) Did 12222 RSA 4096 private key parse operations in 4004831us (3051.8 ops/sec) Did 10626 ECDSA P-384 signing operations in 4030764us (2636.2 ops/sec) Did 10800 ECDSA P-384 verify operations in 4052718us (2664.9 ops/sec) Did 4182 ECDSA P-521 signing operations in 4076198us (1026.0 ops/sec) Did 4059 ECDSA P-521 verify operations in 4063819us (998.8 ops/sec) After: Did 7189 RSA 2048 signing operations in 4021331us (1787.7 ops/sec) [+0.7%] Did 326000 RSA 2048 verify (same key) operations in 4010811us (81280.3 ops/sec) [-0.2%] Did 278000 RSA 2048 verify (fresh key) operations in 4004206us (69427.0 ops/sec) [-0.1%] Did 53040 RSA 2048 private key parse operations in 4050953us (13093.2 ops/sec) [+51.8%] Did 1200 RSA 4096 signing operations in 4035548us (297.4 ops/sec) [-0.2%] Did 90000 RSA 4096 verify (same key) operations in 4035686us (22301.0 ops/sec) [+0.2%] Did 80000 RSA 4096 verify (fresh key) operations in 4020989us (19895.6 ops/sec) [+1.6%] Did 20468 RSA 4096 private key parse operations in 4037474us (5069.5 ops/sec) [+66.1%] Did 11070 ECDSA P-384 signing operations in 4023595us (2751.3 ops/sec) [+4.4%] Did 11232 ECDSA P-384 verify operations in 4063116us (2764.4 ops/sec) [+3.7%] Did 4387 ECDSA P-521 signing operations in 4052728us (1082.5 ops/sec) [+5.5%] Did 4305 ECDSA P-521 verify operations in 4064660us (1059.1 ops/sec) [+6.0%] Change-Id: If2f739373cdd10fa1d4925d5e2725e87d2255fc0 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56966 Reviewed-by: Bob Beck <[email protected]> Commit-Queue: David Benjamin <[email protected]> (cherry picked from commit d1b451676eada2f2dcad9a20debf8b76fa17f403) * Fix various malloc failure paths. Caught by running malloc failure tests on unit tests. Bug: 563 Change-Id: Ic0167ef346a282dc8b5a26a1cedafced7fef9ed0 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56927 Commit-Queue: David Benjamin <[email protected]> Reviewed-by: Bob Beck <[email protected]> (cherry picked from commit f7d37fba96e5640186b31ccb834bde98102d6ac7) * Limit the CMake -isysroot assembly workaround to older CMake It was fixed in CMake 3.19 with https://gitlab.kitware.com/cmake/cmake/-/issues/20771 Change-Id: Ia76ab6690e233bc650e11a79db381c00f21c83a1 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56568 Commit-Queue: David Benjamin <[email protected]> Reviewed-by: Bob Beck <[email protected]> (cherry picked from commit 61266e464b9b509a8a0943b9cc826c97c31e04e7) * Remove old clang-cl workaround Looks like this has since been fixed (or isn't hitting GTest anymore for some reason). Bug: chromium:772117 Change-Id: I2c2fb694e4429281e20fd252ef9c2c34e29a425c Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56570 Reviewed-by: Bob Beck <[email protected]> Commit-Queue: David Benjamin <[email protected]> (cherry picked from commit 60d61196e43cfcea45936de667f98f5d6a6fa684) * Also test i2d_GENERAL_NAME in X509Test.GeneralName GENERAL_NAME uses a weird ASN1_SEQUENCE item type. Test that serializing it works. Change-Id: I8d44eb637f58a9fbe870b1998b0d75e2bfcde601 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56986 Auto-Submit: David Benjamin <[email protected]> Commit-Queue: David Benjamin <[email protected]> Commit-Queue: Bob Beck <[email protected]> Reviewed-by: Bob Beck <[email protected]> (cherry picked from commit e3912cdf9b5095f8ecdf1c6390b79ebe5cf48f31) * Unexport GENERAL_NAME_cmp This function was involved in both CVE-2020-1971 and CVE-2023-0286. Both times, we've had to confirm there were no external callers. Unexport it so we can be sure of this. Change-Id: I37b756f5bd66e389f03540872371001c85a0b5af Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56987 Reviewed-by: Bob Beck <[email protected]> Commit-Queue: David Benjamin <[email protected]> (cherry picked from commit d3d7d36151ce966ed132bfcb9e108fffa6d70535) * Remove if'd-out OCB-AES assembly BoringSSL never shipped the OCB-AES assembly, but took two different strategies in disabling it for x86 versus x86_64. For x86, the implementation was deleted, but for x86_64 it was wrapped in `if(0)`. Since we're no longer as concerned about keeping the assembly from diverging from upstream, be consistent in how the OCB-AES functions are removed from both by deleting them from x86_64. Change-Id: I5233134e3e131fed56f365ed6f43f30c39dd2e33 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56989 Reviewed-by: David Benjamin <[email protected]> Commit-Queue: David Benjamin <[email protected]> (cherry picked from commit 70e415d6b836c3c98b207a4c050c99c7971a1930) * Remove stale TODO in util/bot/DEPS We are using the CIPD copy of CMake on Windows now. Change-Id: Idb9f62876c69333f9504540bad8321a173eaec3e Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56988 Commit-Queue: Bob Beck <[email protected]> Reviewed-by: Bob Beck <[email protected]> Auto-Submit: David Benjamin <[email protected]> (cherry picked from commit 49d7b2d6a4d586e2debfa847e81b5ad9a9d3218a) * Update build files in generated-src * Added RSA-3072 sign/verify to the speed tool (#877) * Add missing APIs to better support Bind (#904) * Add missing APIs to better support Bind: * BN_GENCB_set_old sets old style callbacks on BN_GENCB * DH_clear_flags which is a no op * Define but don't support DH_FLAG_CACHE_MONT_P, AWS-LC always uses montgomery multiplication for DH * Disable 3DES by default (#906) Disable 3DES from default cipher suite list. The TLS version specific keyword rules still add 3DES when appropriate (and that is TLS 1 and TLS 1.1). Some keyword rules map to the default rule ALL. Instead of hard-coding these into the cipher suite mapping structure, we factor them out and just keep the default ALL rule in the structure. Future keywords that should map to default can then be added to a separate list. * Bump release version to v1.6.0 (#909) Prepare new release by bumping release version number string to v1.6.0. * Enable Valgrind testing for libssl (#912) * Enable valgrind testing for ssl/ssl_test * Add valgrind target for ssl/test/runner * Increase valgrind instance size * Add SHA3_Squeeze output buffer overflow test (#893) * Add SHA3_Squeeze output buffer overflow test * Wrap SHA3 ARMv8 ASM optimizations in C function * Enable SHA3, deprecate its opt-in functions (#898) For the deprecated opt-in functions, we modify the getter to always retrun true and modify the setter to be a no-op as SHA3 is now always enabled. * Zeroize data immediately after use for FIPS (#911) * Merge branch 's2n-bignum-2023-03-31' into aws-lc-s2n-bignum-update-2023-03-31 (#916) * Add basic NIST P-384 point operations A point doubling function, point addition function, and point mixed addition function for the P-384 curve, all using Jacobian coordinates in a Montgomery representation, with input nondegeneracy assumed. Once again, the addition and mixed addition functions offer only marginal efficiency gains over just calling a sequence of basic field operations, but the doubling has some beneficial mathematically equivalent short-cutting of the intermediate modular reductions. s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/df8e913c542e5392a9f9cb6cd42fc90c5a02f72e * Tweak ARM bignum_sqr_p521_alt to use fewer registers s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/57a43a3c6f4d29c822b1c226557ced539be575ef * Add basic NIST P-521 point operations The same trio of a point doubling function, point addition function and point mixed addition function, this time for the P-521 curve, all using Jacobian coordinates, with input nondegeneracy assumed. s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/047c0b1401610f9933a60ce0836143f9217ffa34 * update license headers of .c and .s files s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/f6670af4f5dc4df40d334a79f39a4cbf727b8510 * update license headers of Makefiles and sed files s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/5746a969ad2cff94b6172151375fc4319375879f * Add SM2 mapping to Montgomery representation s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/0579f2c5d9c0fe75951d3c0f4e45d3c0b8709bbb * Add SM2 field negation And tweak the ARM implementations of analogous functions for P-256 and P-384 to avoid a couple of instructions by using immediates directly instead of loading constants. s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/ed5fdd3c8822cd593248d38cea038f71c89fd5b6 * Add SM2 field doubling and halving And again, make minor tweaks to the ARM implementations of some analogous functions for P-256 and P-384. s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/0ac9eea30a37c3fef7505647105428cc3bff1185 * Add basic SM2 point operations The same trio as for the NIST curves: a point doubling function, point addition function and point mixed addition function, all using Jacobian coordinates, and all with input nondegeneracy assumed (see the formal spec for the exact assumptions). s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/1cdd6ff1a7cd77b6bc73f690ee89db41bcd787ed --------- Co-authored-by: John Harrison <[email protected]> Co-authored-by: jargh <[email protected]> Co-authored-by: sachiang <[email protected]> Co-authored-by: Samuel Chiang <[email protected]> * Remove remaining branches in bn/generic.c (#919) This is a follow up to cc3e7ce which skipped some of these changes. I verified with this change bn/generic.c matches the upstream file before their latest commit. * Build flag AWS_LC_IGNORE_BN_SET_FLAGS (#918) * Revert "Merge branch 's2n-bignum-2023-03-31' into aws-lc-s2n-bignum-update-2023-03-31 (#916)" This reverts commit 9c005182944d3fb3731eb8a5748b95807ad1a6d4. * Clean up test_support_lib and GTest dependencies slightly. test_support_lib depends on GTest and should be marked as such. Historically it was a bit fuzzy, but now it's unambiguous. With that cleaned up, we can remove one of the global include_directories calls and rely on CMake's INTERFACE_INCLUDE_DIRECTORIES machinery. (CMake's documentation and "modern CMake" prefers setting include directories on the target and letting them flow up the dependency tree, rather than configuring it globally across the project.) Change-Id: I364df834d62328b69f146fbe35c10af97618a713 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56567 Reviewed-by: Bob Beck <[email protected]> Commit-Queue: David Benjamin <[email protected]> * Remove global_target from build. This was added with the generated symbol-prefixing header. But it seems to be sufficient for crypto to have a dependency on the generated header, along with some of the stray bits of delocate. It's a little unclear from CMake documentation how these are processed; normally .o files can be built before libraries are built or linked, only the link step depends on. But, empirically, if A links B, and B has a dependency on C, then CMake seems to run C before building any of A. I tested this by making a small project where the generation step slept for three seconds and running with enough parallelism that we'd have tripped. Interestingly, in the Makefile output, the individual object file targets didn't have that dependency, but the target itself did. But this was true on both A and B, so I think that just might not work. Also fix the dependency in the custom target. The old formulation broke when using an absolute path to the symbols file. Change-Id: I2053d44949f907d465da403a5ec69c191740268f Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56928 Commit-Queue: David Benjamin <[email protected]> Reviewed-by: Bob Beck <[email protected]> * Make boringssl_gtest_main a STATIC library Prior to 3.12 (which we won't be requiring until July), OBJECT libraries cannot be used with target_link_libraries. That means they cannot pick up INTERFACE_INCLUDE_DIRECTORIES, which makes them pretty unusable in the "modern CMake" style. Just switch it to a static library to unbreak the build in CMake 3.10. For some link ordering reason I don't understand, this also requires explicitly linking boringssl_gtest to libcxx when we USE_CUSTOM_LIBCXX is set. Change-Id: Ia9d8351551f5da060248aa3ca73fe04473bf62aa Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57345 Commit-Queue: Bob Beck <[email protected]> Reviewed-by: Bob Beck <[email protected]> Auto-Submit: David Benjamin <[email protected]> * Specify -Iinclude with the crypto target. It's unclear to me whether doing it target-by-target is an improvement in crypto/fipsmodule, but this otherwise does seem a bit tidier. This aligns with CMake's documentation and "modern CMake" which prefers this pattern. Change-Id: I36c81842bff8b36eeaaf5dd3e0695fb45f3376c9 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56585 Commit-Queue: David Benjamin <[email protected]> Reviewed-by: Bob Beck <[email protected]> * Reference OPENSSL_ia32cap_P at Intel machine code level through a unique symbol instead of a common offset symbol avoiding add instruction (#862) See #856 for background and description of issue resolved in this PR. This is the second PR out of two PRs that bridge the performance gap. See ticket for performance comparisons. The first PR took care of the C-level. But the machine-optimised algorithm implementation sometimes directly dereference OPENSSL_ia32cap_P. These also need to be fixed-up. As before, we can't just add a call instruction to OPENSSL_ia32cap_get because it would compromise soundness/correctness. Recall, that the issue was the add instruction. Instead of injecting that instruction, we instead do the following for each occurrence of OPENSSL_ia32cap_P discovered in the textual assembly: * Define a new unique symbol. Uniquness is ensured using both the original (not true in all cases, but see "call outs") register name and a unique global counter. Bit redundant using both, but makes it easier to read IMO. * Under the new unique symbol, copy the address of OPENSSL_ia32cap_P into the original register. This will probably spawn a relocation, but this is fine, because we put this new unique symbol outside the FIPS module scope in .text. Putting it here also means that we know the address relative to RIP. * To jump back to the original execution point, we inject a "return" symbol/label where OPENSSL_ia32cap_P was discovered. * AES-GCM enabled with AVX512 vAES and vPCLMULQDQ. (#692) Add AES-GCM implementation utilizing the enhanced crypto ISA's AVX512 vAES and vPCLMULQDQ. Performance numbers measured on an EC2 m6i instance with Intel(R) Xeon(R) Platinum 8375C CPU: Operation | patched | baseline | EVP-AES-128-GCM Encrypt (16 bytes) | 4704346.1 | 4861893.0 | 0.97 EVP-AES-128-GCM Decrypt (16 bytes) | 4882642.6 | 4813297.3 | 1.01 EVP-AES-128-GCM Encrypt (256 bytes) | 4078340.9 | 3987586.5 | 1.02 EVP-AES-128-GCM Decrypt (256 bytes) | 4328520.6 | 3928698.9 | 1.10 EVP-AES-128-GCM Encrypt (1350 bytes) | 2715983.7 | 2292394.5 | 1.18 EVP-AES-128-GCM Decrypt (1350 bytes) | 2700846.1 | 2359556.4 | 1.14 EVP-AES-128-GCM Encrypt (8192 bytes) | 1191100.7 | 679803.5 | 1.75 EVP-AES-128-GCM Decrypt (8192 bytes) | 1202249.8 | 685041.6 | 1.76 EVP-AES-128-GCM Encrypt (16384 bytes) | 700707.1 | 367700.7 | 1.91 EVP-AES-128-GCM Decrypt (16384 bytes) | 705060.9 | 370331.2 | 1.90 EVP-AES-192-GCM Encrypt (16 bytes) | 4680506.6 | 4806730.8 | 0.97 EVP-AES-192-GCM Decrypt (16 bytes) | 4863172.2 | 4744569.7 | 1.02 EVP-AES-192-GCM Encrypt (256 bytes) | 4041011.6 | 3912140.5 | 1.03 EVP-AES-192-GCM Decrypt (256 bytes) | 4307267.4 | 3851845.9 | 1.12 EVP-AES-192-GCM Encrypt (1350 bytes) | 2653241.2 | 2203656.2 | 1.20 EVP-AES-192-GCM Decrypt (1350 bytes) | 2651021.8 | 2264766.7 | 1.17 EVP-AES-192-GCM Encrypt (8192 bytes) | 1131315.6 | 632555.3 | 1.79 EVP-AES-192-GCM Decrypt (8192 bytes) | 1146644.5 | 636672.8 | 1.80 EVP-AES-192-GCM Encrypt (16384 bytes) | 661890.1 | 340302.0 | 1.95 EVP-AES-192-GCM Decrypt (16384 bytes) | 667961.9 | 342540.3 | 1.95 EVP-AES-256-GCM Encrypt (16 bytes) | 4360655.5 | 4448514.2 | 0.98 EVP-AES-256-GCM Decrypt (16 bytes) | 4524082.6 | 4417907.2 | 1.02 EVP-AES-256-GCM Encrypt (256 bytes) | 3786200.8 | 3653974.4 | 1.04 EVP-AES-256-GCM Decrypt (256 bytes) | 4041289.9 | 3588325.4 | 1.13 EVP-AES-256-GCM Encrypt (1350 bytes) | 2492820.5 | 2039554.2 | 1.22 EVP-AES-256-GCM Decrypt (1350 bytes) | 2491387.1 | 2093344.8 | 1.19 EVP-AES-256-GCM Encrypt (8192 bytes) | 1074842.0 | 583915.9 | 1.84 EVP-AES-256-GCM Decrypt (8192 bytes) | 1081206.4 | 587709.1 | 1.84 EVP-AES-256-GCM Encrypt (16384 bytes) | 630155.0 | 313262.3 | 2.01 EVP-AES-256-GCM Decrypt (16384 bytes) | 628752.9 | 315892.9 | 1.99 AEAD-AES-128-GCM seal (16 bytes) | 10859554.5 | 10813717.6 | 1.00 AEAD-AES-128-GCM seal (256 bytes) | 7911436.7 | 7305720.8 | 1.08 AEAD-AES-128-GCM seal (1350 bytes) | 4000694.0 | 3118647.6 | 1.28 AEAD-AES-128-GCM seal (8192 bytes) | 1393107.0 | 737327.6 | 1.89 AEAD-AES-128-GCM seal (16384 bytes) | 771988.4 | 384239.6 | 2.01 AEAD-AES-256-GCM seal (16 bytes) | 10330064.1 | 10192387.9 | 1.01 AEAD-AES-256-GCM seal (256 bytes) | 7489985.0 | 6831952.2 | 1.10 AEAD-AES-256-GCM seal (1350 bytes) | 3660191.4 | 2764809.2 | 1.32 AEAD-AES-256-GCM seal (8192 bytes) | 1244452.4 | 630821.5 | 1.97 AEAD-AES-256-GCM seal (16384 bytes) | 686051.9 | 327090.4 | 2.10 * Update pbkdf2 service indicator to require lower bound of 1000 iterations (#924) * TLS 1.3 Transfer Support (#891) * Bump AWS-LC API version (#900) * Factor out C-level implementation of curve25519 arithmetic and algorithm functions (#922) Moves curve25519 C-level implementation to its own compilation unit Note: this does not completely move ed25519. Waiting until the public interface from s2n-bignum is ready. * Don't use negative values for unimplemented modes Our EVP_CIPHER_mode returns an unsigned value and including negative numbers in switch/case when the value is unsigned causes some warnings. This should avoid the need for https://github.com/nodejs/node/pull/46564 (Having them be positive shouldn't have compat impacts. CCM is 8, but no cipher will report CCM, so any path checking for it will just be dead code.) Change-Id: I8dcf5ea55fad9732a09d6da73114cde5d69397d3 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57025 Reviewed-by: Bob Beck <[email protected]> Auto-Submit: David Benjamin <[email protected]> Commit-Queue: David Benjamin <[email protected]> (cherry picked from commit d9ea5553c3c9af6460257b035e9ebfbffbc78a1d) * Move Go CMake support into its own file. Slowly reduce clutter in the top-level CMake file. Change-Id: Ib7ca2aee7337db82ed1989c56bbaaf6ee5da0768 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/56569 Reviewed-by: Bob Beck <[email protected]> Commit-Queue: David Benjamin <[email protected]> (cherry picked from commit 261ec612e21b81a4c16bbda615d0850556483b4f) * VMS? I don't think so. Take this for a walk behind the barn. Change-Id: Ia7518c6eeb87f21bbcb88d3b688745d07e963662 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57085 Reviewed-by: David Benjamin <[email protected]> Commit-Queue: Bob Beck <[email protected]> (cherry picked from commit 8846d7b3c35745f5ecc053f650dd76d3750d7ce5) * Bound the overall output size of ASN1_generate_v3 The output of ASN1_generate_v3 is *mostly* linear with the input, except SEQ and SET reference config sections. Sections can be referenced multiple times, and so the structure grows exponentially. Cap the total output size to mitigate this. As before, we don't consider these functions safe to use with untrusted inputs, but unbounded growth will frustrate fuzzing. This CL sets the limit to 64K, which should be enough for anyone. (This is the size of a single X.509 extension, whereas certificates themselves should not get that large.) While not strictly necessary, this also rearranges the ASN1_mbstring_copy call to pass in a maximum output. This portion does scale linearly with the output, so it's fine, but the fuzzer discovered an input with a 700K-byte input, which, with fuzzer instrumentation and sanitizers, seems to be a bit slow. This change should help the fuzzer get past those cases faster. Update-Note: The stringly-typed API for constructing X.509 extensions now has a maximum output size. If anyone was constructing an extension larger than 64K, this will break. This is unlikely and should be caught by unit tests; if a project hits this outside of tests, that means they are passing untrusted input into this function, which is a security vulnerability in itself, and means they especially need this change to avoid a DoS. Bug: oss-fuzz:55725 Change-Id: Ibb65854293f44bf48ed5855016ef7cd46d2fae77 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57125 Reviewed-by: Bob Beck <[email protected]> Commit-Queue: David Benjamin <[email protected]> Auto-Submit: David Benjamin <[email protected]> (cherry picked from commit 9580424ca8579317d0ccf1d8db5e58539f239a20) * Add a note in INCORPORATING about which branch to use Especially when they were named "2214" instead of "chromium-2214", I've seen papers and other projects treat them as releases. Add a note to make it clear they are not releases. AWS-LC: - Update example branch names to match AWS-LC's branches Change-Id: Ie820b800de3d25a31d3083d4ceff75e1d7f74a06 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57145 Reviewed-by: Bob Beck <[email protected]> Auto-Submit: David Benjamin <[email protected]> Commit-Queue: Bob Beck <[email protected]> (cherry picked from commit ace33161544814ed6dc9e9d17cfde0422881b9d2) * Add OPENSSL_asprintf and friends for asprintf(3) functionality. This includes an internal version which allows a flag to specify the use of system malloc, or OPENSSL_malloc - this in turn allows us to use this function in the ERR family of functions and allow for ERR to not call OPENSSL_malloc with a circular dependency. Bug: 564 Change-Id: Ifd02d062fda9695cddbb0dbef2e1c1db0802a486 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57005 Auto-Submit: Bob Beck <[email protected]> Reviewed-by: David Benjamin <[email protected]> Commit-Queue: Bob Beck <[email protected]> (cherry picked from commit 350f8547cf2101669684ebdb99b49b11fff5e217) * Make ERR and thread use system malloc. This will let us call ERR and thread_local from OPENSSL_malloc without creating a circular dependency. We also make ERR_get_error_line_data add ERR_FLAG_MALLOCED to the returned flags value, since some projects appear to be making assumptions about it being there. Bug: 564 Update-Note: Any recent documentation (in all OpenSSL forks) for the ERR functions cautions against freeing the returned ERR "data" strings, as freeing them is handled by the error library. This change can make an existing double free bug more obvious by being more likely to cause a crash with the double free. Change-Id: Ie30bd3aee0b506473988b90675c48510969db31a Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57045 Reviewed-by: David Benjamin <[email protected]> Commit-Queue: Bob Beck <[email protected]> Auto-Submit: Bob Beck <[email protected]> (cherry picked from commit fc524c161e8640e017b0d838f76e75dc49181e34) * Cap decimal input sizes in s2i_ASN1_INTEGER Decoding from decimal takes quadratic time, and BN_dec2bn will happily decode however large of input you pass in. This frustrates fuzzers. I've added a cap to the input length in s2i_ASN1_INTEGER for now, rather than BN_dec2bn, because we've seen people use BN for surprisingly large calculator operations, and BN generally doesn't cap inputs to quadratic (or worse) algorithms beyond memory limits. (We generally rely on cryptography using fixed parameter sizes, though RSA, DSA, and DH were misstandardized and need ad-hoc limits everywhere.) Update-Note: The stringly-typed API for constructing X.509 extensions now has (very generous) maximum input length for decimal integers of 8,192 digits. If anyone was relying on a higher input, this will break. This is unlikely and should be caught by unit tests; if a project hits this outside of tests, that means they are passing untrusted input into this function, which is a security vulnerability in itself, and means they especially need this change to avoid a DoS. Bug: chromium:1415108 Change-Id: I138249d23ca6b1996f8437dba98633349bb3042b Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57205 Commit-Queue: David Benjamin <[email protected]> Reviewed-by: Bob Beck <[email protected]> Auto-Submit: David Benjamin <[email protected]> (cherry picked from commit d5e93f521b3fd4f57049583a1584d285e5aab16c) * Make OPENSSL_malloc push ERR_R_MALLOC_FAILURE on failure. Remove all the other ERR_R_MALLOC_FAILURES from the codebase. Also changes cbb to push to the error stack, to correctly report cbb failures instead of now only reporting malloc failures. Previously it turned all cbb failures into a malloc failure Bug: 564 Change-Id: Ic13208bf9d9aaa470e83b2f15782fc94946bbc7b Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57046 Auto-Submit: Bob Beck <[email protected]> Commit-Queue: David Benjamin <[email protected]> Reviewed-by: David Benjamin <[email protected]> (cherry picked from commit dcabfe2d8940529a69e007660fa7bf6c15954ecc) * Remove remaining ERR_R_MALLOC_FAILURE calls in code added by AWS-LC * Fix a -Wignored-qualifiers warning in trust_token_test.cc The const bool doesn't do anything. While I'm here, make the methods const. Change-Id: Id8c31d5fcda6d8bc244c64b02b1d758e4eff6849 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57185 Auto-Submit: David Benjamin <[email protected]> Commit-Queue: Steven Valdez <[email protected]> Reviewed-by: Steven Valdez <[email protected]> (cherry picked from commit ec64d7e01a7ca30957c8bce38f6ad989e0b8ced1) * Align the hash-to-curve formulation with draft-16. draft-07 to draft-16 is mostly editorial, but there were a few notable changes: - Empty DST values are forbidden. - The sample implementation for map_to_curve_simple_swu has completely changed. The new formulation has the same performance (if not a smidge faster), and aligning with the spec seems generally useful. - P-384 is now paired with SHA-384, not SHA-512. As this would be a breaking change for the trust tokens code, I've left that in. A follow-up CL will add implementations of draft-16, which is expected to match the final draft. Before: Did 77000 hash-to-curve P384_XMD:SHA-512_SSWU_RO_ operations in 4025677us (19127.2 ops/sec) Did 7156000 hash-to-scalar P384_XMD:SHA-512 operations in 4000385us (1788827.8 ops/sec) After: Did 77000 hash-to-curve P384_XMD:SHA-512_SSWU_RO_ operations in 4009708us (19203.4 ops/sec) [+0.4%] Did 7327000 hash-to-scalar P384_XMD:SHA-512 operations in 4000477us (1831531.6 ops/sec) [+2.4%] Bug: 1414562 Change-Id: Ic3c37061e325250d5d8723fd9aa263930c6023cf Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57146 Auto-Submit: David Benjamin <[email protected]> Reviewed-by: Steven Valdez <[email protected]> Commit-Queue: Steven Valdez <[email protected]> (cherry picked from commit 9c9b2c219fc817940cb31cc0d055c61c5986e058) * Implement P256_XMD:SHA-256_SSWU_RO_ and P384_XMD:SHA-384_SSWU_RO_ Also add public APIs for this, now that the specification is no longer expected to change, and because a project external to the library wishes to use it. For now, I've kept the P-256 version using the generic felem_exp, but we should update that to use the specialized field arithmetic. Trust Tokens will presumably move to this later and, in the meantime, another team wants this. Bug: chromium:1414562 Change-Id: Ie38203b4439ff55659c4fb2070f45d524c55aa2a Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57147 Commit-Queue: David Benjamin <[email protected]> Reviewed-by: Steven Valdez <[email protected]> (cherry picked from commit 3950d6ce25c263c3d131985edfcd6b0899a7949e) * Unify the two copies of bn_add_words and bn_sub_words Compilers are fine at inlining functions nowadays. We can hide the BN_ULLONG vs. manual carry extraction inside an inline function. I've patterned the type signatures intentionally after Clang's builtins, in case we want to use them in the future. (Previously I wrote in https://boringssl-review.googlesource.com/c/boringssl/+/56966 that the builtins weren't good on aarch64. This wasn't quite right. Rather, they were bad on both x86_64 and aarch64 in LLVM 13, but they're fine on both in LLVM 14. My machine's Xcode was just a little old.) Change-Id: I666466dce7a146d5e49e94ff372ea018b610ef34 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/57245 Commit-Queue: David Benjamin <[email protected]> Auto-Submit: David Benjamin <[email protected]> Reviewed-by: Bob Beck <[email protected]> (cherry picked from commit de12e3cabcb466cdf96c90d9bff0d919af71d561) * Ignore cmake files in the licensce check * Actually disable Go in the ancient CI. This previously worked because we never checked if go was found or not, and in the case of disabling testing and libssl nothing attempted to use Go. cmake/go.cmake now has more sanity checks and detects that the docker container doesn't have Go installed. * TLS 1.3 Transfer Fuzzing (#929) * Release v1.7.0 (#933) * Update size of BORINGSSL_function_hit to account for new aes_gcm_encrypt_avx512 (#934) * Add missing boringssl_prefix_symbols target (#931) * Add back ASN1_STRING_clear_free that was accidentally removed (#936) Co-authored-by: dkostic <[email protected]> * Re-factor machine-level optimisation decision logic for x25519 (#932) After factoring out the C-implementation in c5e2fb8, now we factor out the control flow decision logic that selects the backend algorithm. This makes it simpler to add new backends. * Improved support for OpenSSH (#894) * Release v1.8.0 (#944) * Delete leftover code from previous Kyber implementation (#943) Co-authored-by: dkostic <[email protected]> * Remove Gitter from README.md (#947) * Per file namespace for symbolic labels s2n-bignum original commit: https://github.com/awslabs/s2n-bignum/commit/da2b90c7fc45b77c639528cea7898575c73f6f39 * Provide a compile option to disable AVX512-specific optimizations. (#945) - The new flag name ends with "512AVX" instead of "AVX512" so that it doesn't include the entire flag -DMY_ASSEMBLER_IS_TOO_OLD_FOR_AVX and match it in the Perl files checks. - avx512 is disabled when avx is. * AES-GCM optimised for AArch64 with EOR3 instruction and a wide AES/PMULL pipeline - 8x unrolling (#721) This is an integration of Armv8 optimised implementation of AES-GCM which employs stitching (interleaving) of the AES instructions with the GHASH calculation, which operates on 8 input blocks in parallel. The code was first contributed to OpenSSL in https://github.com/openssl/openssl/commit/954f45ba4c50457 [PerformanceImprovement.md](https://github.com/aws/aws-lc/files/10900781/PerformanceImprovement.md) 0206ff5bed811e512cf92dc8e by [email protected] in Jan, 2022. It was cherry-picked into 3.1.0 by the end of 2022 in https://github.com/openssl/openssl/commit/34ca334e5de6837f2c6bc0b0b0df28bdd237e4d7. The code is targeting the Arm Neoverse V1 and Apple M1's architectures since it relies on 2 HW capabilities: SHA3 extension, particularly the 3-way XOR instruction, EOR3, and a wider ASIMD pipeline which processes the AES and PMULL instructions, the latter being the one used in the GHASH calculation. ### Call-outs: Using ARMV8_NEOVERSE_V1 and ARMV8_APPLE_M1 as bits in OPENSSL_armcap_P is different from the original code: * It tests for specific micro-architectures known to have a wide crypto, i.e. AES/PMULL pipeline. * It replaces the bit ARMV8_AES_GCM_UNROLL8 which designated both a wide pipeline and the availability of SHA3 extension. ### Performance Improvement: Numbers are in **MB/s** #### EVP API | Algorithm | Operation | Bytes | AWS-LC PR#721 | AWS-LC main | ratio to main | | --- | --- | --- | --- | --- | --- | | EVP-AES-128-GCM | Encrypt | 16 | 90.7 | 99.5 | 0.91 | | | | 128 | 648.6 | 692.4 | 0.94 | | | | 192 | 883.3 | 962.5 | 0.92 | | | | 256 | 1176.1 | 1197.9 | 0.98 | | | | 512 | 1996.3 | 1892.7 | 1.05 | | | | 1350 | 3376.7 | 2876.8 | 1.17 | | | | 8192 | 5810.9 | 4008.2 | 1.45 | | | | 16384 | 6209.3 | 4163.2 | 1.49 | | | Decrypt | 16 | 85.5 | 92.6 | 0.92 | | | | 128 | 615.1 | 670.4 | 0.92 | | | | 192 | 869 | 940.3 | 0.92 | | | | 256 | 1125.3 | 1177.7 | 0.96 | | | | 512 | 1927.2 | 1895.5 | 1.02 | | | | 1350 | 3251.8 | 2962.6 | 1.10 | | | | 8192 | 5851.4 | 4436.9 | 1.32 | | | | 16384 | 6271.6 | 4652.7 | 1.35 | | EVP-AES-192-GCM | Encrypt | 16 | 87.6 | 96.2 | 0.91 | | | | 128 | 628.4 | 681.4 | 0.92 | | | | 192 | 866.3 | 941.4 | 0.92 | | | | 256 | 1146.2 | 1168.9 | 0.98 | | | | 512 | 1924.7 | 1836.4 | 1.05 | | | | 1350 | 3213.4 | 2765 | 1.16 | | | | 8192 | 5480 | 3913.5 | 1.40 | | | | 16384 | 5838.7 | 4068.2 | 1.44 | | | Decrypt | 16 | 82.4 | 90.9 | 0.91 | | | | 128 | 594.9 | 656.5 | 0.91 | | | | 192 | 838.1 | 919.2 | 0.91 | | | | 256 | 1084.2 | 1147.2 | 0.95 | | | | 512 | 1844.5 | 1831.6 | 1.01 | | | | 1350 | 3094 | 2817.1 | 1.10 | | | | 8192 | 5396 | 4154.3 | 1.30 | | | | 16384 | 5765.9 | 4341.8 | 1.33 | | EVP-AES-256-GCM | Encrypt | 16 | 85.6 | 93.9 | 0.91 | | | | 128 | 616.8 | 666.3 | 0.93 | | | | 192 | 846.4 | 917.2 | 0.92 | | | | 256 | 1105.9 | 1137.5 | 0.97 | | | | 512 | 1849.1 | 1773.6 | 1.04 | | | | 1350 | 3033.4 | 2632.9 | 1.15 | | | | 8192 | 5034.9 | 3707.4 | 1.36 | | | | 16384 | 5341.9 | 3845.9 | 1.39 | | | Decrypt | 16 | 81.3 | 89.2 | 0.91 | | | | 128 | 583.2 | 641.8 | 0.91 | | | | 192 | 817.8 | 894.9 | 0.91 | | | | 256 | 1071 | 1114.1 | 0.96 | | | | 512 | 1800 | 1760.9 | 1.02 | | | | 1350 | 2969.6 | 2662.8 | 1.12 | | | | 8192 | 5110.4 | 3865 | 1.32 | | | | 16384 | 5449 | 4024.8 | 1.35 | #### AEAD API | Algorithm | Operation | Bytes | AWS-LC PR#721 | AWS-LC main | ratio to main | | --- | --- | --- | --- | --- | --- | | AEAD-AES-128-GCM | Seal (Encrypt) | 16 | 177.4 | 179.2 | 0.99 | | | | 128 | 1157.4 | 1163.8 | 0.99 | | | | 192 | 1543 | 1549.5 | 1.00 | | | | 256 | 1916.3 | 1851 | 1.04 | | | | 512 | 2973.1 | 2630.4 | 1.13 | | | | 1350 | 4236.1 | 3353.7 | 1.26 | | | | 8192 | 6208.4 | 4205.1 | 1.48 | | | | 16384 | 6444.2 | 4290.5 | 1.50 | | | Open (Decrypt) | 16 | 159.8 | 161.8 | 0.99 | | | | 128 | 1068.3 | 1080 | 0.99 | | | | 192 | 1454.3 | 1480 | 0.98 | | | | 256 | 1790.4 | 1783.7 | 1.00 | | | | 512 | 2826.8 | 2613.6 | 1.08 | | | | 1350 | 4068.2 | 3435.3 | 1.18 | | | | 8192 | 6223.2 | 4626 | 1.35 | | | | 16384 | 6487.6 | 4754.3 | 1.36 | | AEAD-AES-128-GCM | Seal (Encrypt) | 16 | 171.1 | 170.6 | 1.00 | | | | 128 | 1093.8 | 1099 | 1.00 | | | | 192 | 1445.2 | 1453.3 | 0.99 | | | | 256 | 1779.9 | 1726.7 | 1.03 | | | | 512 | 2710.3 | 2419.8 | 1.12 | | | | 1350 | 3744 | 3041.5 | 1.23 | | | | 8192 | 5330.6 | 3842.6 | 1.39 | | | | 16384 | 5508.6 | 3916.1 | 1.41 | | | Open (Decrypt) | 16 | 155.9 | 158.8 | 0.98 | | | | 128 | 1013.2 | 1024 | 0.99 | | | | 192 | 1361.8 | 1387.1 | 0.98 | | | | 256 | 1686.5 | 1658.2 | 1.02 | | | | 512 | 2607.3 | 2376.9 | 1.10 | | | | 1350 | 3682.2 | 3045.4 | 1.21 | | | | 8192 | 5418.3 | 4005.8 | 1.35 | | | | 16384 | 5620.3 | 4099.1 | 1.37 | ### M1 (AEAD API) #### main ``` Did 52951000 AEAD-AES-128-GCM seal (16 bytes) operations in 3000008us (17650286.3 ops/sec): 282.4 MB/s Did 43846250 AEAD-AES-128-GCM seal (128 bytes) operations in 3000009us (14615372.8 ops/sec): 1870.8 MB/s Did 38685500 AEAD-AES-128-GCM seal (192 bytes) operations in 3000010us (12895123.7 ops/sec): 2475.9 MB/s Did 34504000 AEAD-AES-128-GCM seal (256 bytes) operations in 3000077us (11501038.1 ops/sec): 2944.3 MB/s Did 24288500 AEAD-AES-128-GCM seal (512 bytes) operations in 3000031us (8096083.0 ops/sec): 4145.2 MB/s Did 11798750 AEAD-AES-128-GCM seal (1350 bytes) operations in 3000032us (3932874.7 ops/sec): 5309.4 MB/s Did 2490000 AEAD-AES-128-GCM seal (8192 bytes) operations in 3000049us (829986.4 ops/sec): 6799.2 MB/s Did 1278000 AEAD-AES-128-GCM seal (16384 bytes) operations in 3001880us (425733.2 ops/sec): 6975.2 MB/s Did 49507000 AEAD-AES-128-GCM open (16 bytes) operations in 3000052us (16502047.3 ops/sec): 264.0 MB/s Did 41929000 AEAD-AES-128-GCM open (128 bytes) operations in 3000009us (13976291.4 ops/sec): 1789.0 MB/s Did 36945750 AEAD-AES-128-GCM open (192 bytes) operations in 3000012us (12315200.7 ops/sec): 2364.5 MB/s Did 33254000 AEAD-AES-128-GCM open (256 bytes) operations in 3000013us (11084618.6 ops/sec): 2837.7 MB/s Did 23444000 AEAD-AES-128-GCM open (512 bytes) operations in 3000046us (7814546.8 ops/sec): 4001.0 MB/s Did 11298750 AEAD-AES-128-GCM open (1350 bytes) operations in 3000044us (3766194.8 ops/sec): 5084.4 MB/s Did 2397000 AEAD-AES-128-GCM open (8192 bytes) operations in 3000713us (798810.1 ops/sec): 6543.9 MB/s Did 1196000 AEAD-AES-128-GCM open (16384 bytes) operations in 3000559us (398592.4 ops/sec): 6530.5 MB/s Did 50220000 AEAD-AES-256-GCM seal (16 bytes) operations in 3000014us (16739921.9 ops/sec): 267.8 MB/s Did 40900750 AEAD-AES-256-GCM seal (128 bytes) operations in 3000004us (13633565.2 ops/sec): 1745.1 MB/s Did 35152000 AEAD-AES-256-GCM seal (192 bytes) operations in 3000002us (11717325.5 ops/sec): 2249.7 MB/s Did 31452750 AEAD-AES-256-GCM seal (256 bytes) operations in 3000020us (10484180.1 ops/sec): 2684.0 MB/s Did 21425750 AEAD-AES-256-GCM seal (512 bytes) operations in 3000027us (7141852.4 ops/sec): 3656.6 MB/s Did 10136750 AEAD-AES-256-GCM seal (1350 bytes) operations in 3000018us (3378896.4 ops/sec): 4561.5 MB/s Did 2091000 AEAD-AES-256-GCM seal (8192 bytes) operations in 3000749us (696826.0 ops/sec): 5708.4 MB/s Did 1066000 AEAD-AES-256-GCM seal (16384 bytes) operations in 3001132us (355199.3 ops/sec): 5819.6 MB/s Did 49377250 AEAD-AES-256-GCM open (16 bytes) operations in 3000010us (16459028.5 ops/sec): 263.3 MB/s Did 40825000 AEAD-AES-256-GCM open (128 bytes) operations in 3000007us (13608301.6 ops/sec): 1741.9 MB/s Did 35528000 AEAD-AES-256-GCM open (192 bytes) operations in 3000002us (11842658.8 ops/sec): 2273.8 MB/s Did 31562000 AEAD-AES-256-GCM open (256 bytes) operations in 3000014us (10520617.6 ops/sec): 2693.3 MB/s Did 21607000 AEAD-AES-256-GCM open (512 bytes) operations in 3000001us (7202330.9 ops/sec): 3687.6 MB/s Did 10316750 AEAD-AES-256-GCM open (1350 bytes) operations in 3000034us (3438877.7 ops/sec): 4642.5 MB/s Did 2154000 AEAD-AES-256-GCM open (8192 bytes) operations in 3000688us (717835.4 ops/sec): 5880.5 MB/s Did 1085000 AEAD-AES-256-GCM open (16384 bytes) operations in 3000182us (361644.7 ops/sec): 5925.2 MB/s ``` #### PR ``` Did 52991000 AEAD-AES-128-GCM seal (16 bytes) operations in 3000014us (17663584.2 ops/sec): 282.6 MB/s Did 43686750 AEAD-AES-128-GCM seal (128 bytes) operations in 3000017us (14562167.5 ops/sec): 1864.0 MB/s Did 38589000 AEAD-AES-128-GCM seal (192 bytes) operations in 3000010us (12862957.1 ops/sec): 2469.7 MB/s Did 35045750 AEAD-AES-128-GCM seal (256 bytes) operations in 3000008us (11681885.5 ops/sec): 2990.6 MB/s Did 26081750 AEAD-AES-128-GCM seal (512 bytes) operations in 3000015us (8693873.2 ops/sec): 4451.3 MB/s Did 13458500 AEAD-AES-128-GCM seal (1350 bytes) operations in 3000033us (4486117.3 ops/sec): 6056.3 MB/s Did 3016000 AEAD-AES-128-GCM seal (8192 bytes) operations in 3000802us (1005064.6 ops/sec): 8233.5 MB/s Did 1519000 AEAD-AES-128-GCM seal (16384 bytes) operations in 3000267us (506288.3 ops/sec): 8295.0 MB/s Did 49665750 AEAD-AES-128-GCM open (16 bytes) operations in 3000016us (16555161.7 ops/sec): 264.9 MB/s Did 42062500 AEAD-AES-128-GCM open (128 bytes) operations in 3000017us (14020753.9 ops/sec): 1794.7 MB/s Did 37012250 AEAD-AES-128-GCM open (192 bytes) operations in 3000018us (12337342.6 ops/sec): 2368.8 MB/s Did 34332500 AEAD-AES-128-GCM open (256 bytes) operations in 3000005us (11444147.6 ops/sec): 2929.7 MB/s Did 25532750 AEAD-AES-128-GCM open (512 bytes) operations in 3000006us (8510899.6 ops/sec): 4357.6 MB/s Did 13174750 AEAD-AES-128-GCM open (1350 bytes) operations in 3000052us (4391507.2 ops/sec): 5928.5 MB/s Did 3023000 AEAD-AES-128-GCM open (8192 bytes) operations in 3000759us (1007411.8 ops/sec): 8252.7 MB/s Did 1557000 AEAD-AES-128-GCM open (16384 bytes) operations in 3000047us (518991.9 ops/sec): 8503.2 MB/s Did 49483500 AEAD-AES-256-GCM seal (16 bytes) operations in 3000001us (16494494.5 ops/sec): 263.9 MB/s Did 41014000 AEAD-AES-256-GCM seal (128 bytes) operations in 3000016us (13671260.4 ops/sec): 1749.9 MB/s Did 35160000 AEAD-AES-256-GCM seal (192 bytes) operations in 3000014us (11719945.3 ops/sec): 2250.2 MB/s Did 32640750 AEAD-AES-256-GCM seal (256 bytes) operations in 3000004us (10880235.5 ops/sec): 2785.3 MB/s Did 23427000 AEAD-AES-256-GCM seal (512 bytes) operations in 3000030us (7808921.9 ops/sec): 3998.2 MB/s Did 11443000 AEAD-AES-256-GCM seal (1350 bytes) operations in 3000029us (3814296.5 ops/sec): 5149.3 MB/s Did 2513000 AEAD-AES-256-GCM…

torben-hansen mentioned this pull request Mar 10, 2023

Reference OPENSSL_ia32cap_P at x86 machine code level through a unique symbol instead of a common offset symbol avoiding add instruction #862

Merged

torben-hansen force-pushed the use_function_to_reference_cpucap branch from 57a6164 to 80ccc1e Compare March 15, 2023 15:21

torben-hansen changed the title ~~[DRAFT] Bring back cpu capability trampoline function on the C-level for static fips build~~ Use function reference to get cpu capability vector at the C-level for FIPS static build Mar 16, 2023

torben-hansen added 3 commits March 16, 2023 08:03

Bring back cpu capability trampoline function on the C-level for stat…

309b709

…ic fips build

Replace few more occurrences

64a6d1a

Referencing a function not a variable... so use correct syntax...

73b816e

torben-hansen force-pushed the use_function_to_reference_cpucap branch from 1cc7b83 to 73b816e Compare March 16, 2023 15:03

torben-hansen marked this pull request as ready for review March 16, 2023 15:04

torben-hansen requested a review from dkostic March 16, 2023 15:04

Taffer previously approved these changes Mar 16, 2023

View reviewed changes

torben-hansen changed the title ~~Use function reference to get cpu capability vector at the C-level for FIPS static build~~ Use function reference to get cpu capability vector at the C-level Mar 16, 2023

Forgot some more occurrences

83ae5e5

torben-hansen dismissed Taffer’s stale review via 83ae5e5 March 16, 2023 16:09

Revert "Forgot some more occurrences"

d75ae91

This reverts commit 83ae5e5.

dkostic approved these changes Mar 16, 2023

View reviewed changes

skmcgrail approved these changes Mar 16, 2023

View reviewed changes

torben-hansen enabled auto-merge (squash) March 16, 2023 16:36

torben-hansen merged commit b00c3bd into aws:main Mar 16, 2023

nebeid mentioned this pull request Jun 27, 2023

Support changing OPENSSL_armcap with environment variable on Apple 64-bit ARM systems #1045

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use function reference to get cpu capability vector at the C-level #856

Use function reference to get cpu capability vector at the C-level #856

torben-hansen commented Mar 8, 2023 •

edited

Loading

Taffer left a comment

torben-hansen commented Mar 16, 2023

torben-hansen commented Mar 16, 2023

Use function reference to get cpu capability vector at the C-level #856

Use function reference to get cpu capability vector at the C-level #856

Conversation

torben-hansen commented Mar 8, 2023 • edited Loading

Issues:

Description of changes:

Background

Solution

Testing:

Taffer left a comment

Choose a reason for hiding this comment

torben-hansen commented Mar 16, 2023

torben-hansen commented Mar 16, 2023

torben-hansen commented Mar 8, 2023 •

edited

Loading