-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support changing OPENSSL_armcap with environment variable on Apple 64-bit ARM systems #1045
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In internal.h
:L137
https://github.com/aws/aws-lc/pull/1045/files#diff-f86d797a597d637dd551bbb942147bbb4184579acc54cf0a01bc5831b3a0e0a6R137,
OPENSSL_STATIC_ARMCAP
is define for Apple. Should it be removed?
How were the changes in this PR tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this needs to be removed. I've tested the changes on both a m1 macbook as well as the iOS device we are concerned about by stepping through the code with a debugger and confirming that the non-hw implementation is used when OPENSSL_armcap is set to 0:0. I think this line is setting it for 32-bit Apple processors, which at this point is limited to very old (pre-2007 iirc) machines. I'll update the title and related tickets to reflect that we're only interested in aarch64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that it's for ARM (32-bit). Thank you for the clarification.
802056d
to
c1b91bc
Compare
@@ -26,7 +26,7 @@ void handle_cpu_env(uint32_t *out, const char *in) { | |||
|
|||
// Detect if the user is trying to use the environment variable to set | |||
// a capability that is _not_ available on the CPU: | |||
// If getauxval() returned a non-zero hwcap in `armcap` (out) | |||
// If the runtime capability check (e.g via getauxval() on Linux) returned a non-zero hwcap in `armcap` (out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes, I misunderstood the previous commit suggestion for this line! Will add 👍
@@ -167,7 +167,7 @@ extern uint32_t OPENSSL_armcap_P; | |||
// |CRYPTO_is_ARMv8_AES_capable| and |CRYPTO_is_ARMv8_PMULL_capable| | |||
// for checking the support for AES and PMULL instructions, respectively. | |||
OPENSSL_INLINE int CRYPTO_is_NEON_capable(void) { | |||
#if defined(OPENSSL_STATIC_ARMCAP_NEON) || defined(__ARM_NEON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why were the second parts of the checks removed? Aren't these possible compiler flags that can set the capability statically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to ACLE, __ARM_NEON
is always set to 1 for AARCH64 platforms. As such, if we want OPENSSL_armcap
to determine use of the C implementation, we can't check for this flag since it will always be set. We could use the -march
flag when compiling to disable NEON, but since OPENSSL_armcap
is supposed to determine processor capability at runtime I decided on this approach to reduce confusion.
cpu_aarch64_apple.c
(as well as the counterpart files for other CPUs) still set OPENSSL_armcap
to be the correct value for the processor in OPENSSL_cpuid_setup()
, so removing the check doesn't disable NEON entirely, either.
This applies to __ARM_FEATURE_AES
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't these values be used on ARM processors (32-bit)? We have users who build AWS-LC on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cpucap.c detects if LC is built on 32-bit ARM and sets OPENSSL_armcap
accordingly, so when we call a function like CRYPTO_is_NEON_capable()
, even without a check for these values, it will correctly detect the CPU capabilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be missing something, but I think if OPENSSL_STATIC_ARMCAP
is defined but not OPENSSL_STATIC_ARMCAP_NEON
due to an old compiler that defines __ARM_NEON
, it will not reach the check for OPENSSL_armcap
and will return 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the functions to no longer include the ifdef
statements, similar to how the capability functions for x86 processors in the same file do it. Since OPENSSL_armcap_P
is set correctly even when OPENSSL_STATIC_ARMCAP
is defined, the functions should still return the right result.
Either way, a BORINGSSL_DISPATCH_TEST
to ensure that this doesn't break anything will be introduced in a separate PR (which we should merge before this one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd also need to measure the performance in FIPS build with and without the ifdefs for static build (possibly for dynamic as well) when the STATIC_ARMCAPs are defined. Fetching the OPENSSL_armcap_P variable had some implications on the static FIPS build in x86 #856.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performed a benchmark on both the dynamic and static builds comparing performance of AES on main to the PR branch:
- FIPS mode set
OPENSSL_STATIC_ARMCAP
and other static macros set- Graviton2 on AL2
No performance differences between main & PR for either the dynamic nor static builds.
Full results can be seen here:
aws-lc-dynamic-staticarmcap-aes_main_vs_pr1045_bm.csv
aws-lc-static-staticarmcap-aes_main_vs_pr1045_bm.csv
Could we add the same |
Discussed offline, will submit another PR with these changes as the scope is larger than the context of this one. |
69573bb
to
80102bd
Compare
return (OPENSSL_armcap_P & ARMV8_PMULL) != 0; | ||
#endif | ||
} | ||
|
||
OPENSSL_INLINE int CRYPTO_is_ARMv8_GCM_8x_capable(void) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about this and the next function? Should we remove the #if defined(OPENSSL_STATIC_ARMCAP)
branch in those too?
Co-authored-by: Nevine Ebeid <[email protected]>
80102bd
to
92dcae9
Compare
…-bit ARM systems (aws#1045) * use handle_cpu_env from cpu_aarch64_linux.c to cpu_aarch64_apple.c * always use OPENSSL_armcap() when detecting hardware capabilities on arm * refactor duplicated into cpu_aarch64.h/c * add ifdefs to cpu_aarch64.h/c * Update comments Co-authored-by: Nevine Ebeid <[email protected]> * adjust line length of comment * set arm capabilities functions to fillow format of x86 ones * make rest of the arm capability functions in line with x86 ones --------- Co-authored-by: Nevine Ebeid <[email protected]> Co-authored-by: Bill Yang <[email protected]>
…-bit ARM systems (aws#1045) * use handle_cpu_env from cpu_aarch64_linux.c to cpu_aarch64_apple.c * always use OPENSSL_armcap() when detecting hardware capabilities on arm * refactor duplicated into cpu_aarch64.h/c * add ifdefs to cpu_aarch64.h/c * Update comments Co-authored-by: Nevine Ebeid <[email protected]> * adjust line length of comment * set arm capabilities functions to fillow format of x86 ones * make rest of the arm capability functions in line with x86 ones --------- Co-authored-by: Nevine Ebeid <[email protected]> Co-authored-by: Bill Yang <[email protected]>
* Add BoringSSL Dispatch Test for aarch64 * Make dispatch tests use corruptible registers on aarch64. (#1118) * Support changing OPENSSL_armcap with environment variable on Apple 64-bit ARM systems (#1045) * use handle_cpu_env from cpu_aarch64_linux.c to cpu_aarch64_apple.c * always use OPENSSL_armcap() when detecting hardware capabilities on arm * fix device farm CI build with boto3 (#990) --------- Co-authored-by: Bill Yang <[email protected]> Co-authored-by: Nevine Ebeid <[email protected]> Co-authored-by: Samuel Chiang <[email protected]>
Issues:
Resolves CryptoAlg-1873
Description of changes:
AWS-LC currently uses a static ARM capabilities variable for 64 bit ARM Apple devices, as we know that certain hardware features are always present on them. This change gives Apple ARM users an option to define at runtime the CPU capabilities through an environment variable to determine whether to run the hardware vs. non-hardware implementation of an algorithm is used.
Call-outs:
Refactored
handle_cpu_env()
function incpu_aarch64_linux.c
to a generalcpu_aarch64.c
file to avoid duplicating the code incpu_aarch64_apple.c
. Modified CMakeFIle accordingly.Testing:
Tests pass locally, confirmed that non-hardware APIs are used when
OPENSSL_armcap=0:0
and that the correct hardware APIs are used when it is left unset. Behavior for testing this already exists inall_tests.json
, used byall_tests.go
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.