Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't compile AVX512BF code when it isn't available #4422

Closed
imciner2 opened this issue Jan 11, 2024 · 0 comments · Fixed by #4423
Closed

Don't compile AVX512BF code when it isn't available #4422

imciner2 opened this issue Jan 11, 2024 · 0 comments · Fixed by #4423

Comments

@imciner2
Copy link
Contributor

imciner2 commented Jan 11, 2024

In the Julia OpenBLAS build, we have been working towards enabling BFloat16 support (JuliaPackaging/Yggdrasil#7202). When we are building, we build against multiple gfortran libraries (3, 4, and 5), which means we have to build against older GCC versions - i.e. building for gfortran3 uses GCC 6. Unfortunately, using DYNAMIC_ARCH and BUILD_BFLOAT16=1 together in this case will trigger errors about AVX512 BFloat16 instructions/types being missing, i.e. errors like:

[23:30:59] ../kernel/x86_64/sbgemm_kernel_16x4_cooperlake.c: In function ‘sbgemm_kernel_COOPERLAKE’:
[23:30:59] ../kernel/x86_64/sbgemm_kernel_16x4_cooperlake.c:102:26: warning: implicit declaration of function ‘_mm512_dpbf16_ps’ [-Wimplicit-function-declaration]
[23:30:59]  #define FMA(a, b, r) r = _mm512_dpbf16_ps(r, (__m512bh)a, (__m512bh)b)
[23:30:59]                           ^
[23:30:59] ../kernel/x86_64/sbgemm_kernel_16x4_cooperlake.c:105:2: note: in expansion of macro ‘FMA’
[23:30:59]   FMA(A_lo_##A, B_lo, result_00_##A##Bx##By); \
[23:30:59]   ^~~
[23:30:59] ../kernel/x86_64/sbgemm_kernel_16x4_cooperlake.c:214:47: note: in expansion of macro ‘MATMUL_4X’
[23:30:59]      BROADCAST_B_PAIR(0, 0); PREFETCH_B(0, 0); MATMUL_4X(0, 0, 0);
[23:30:59]                                                ^~~~~~~~~
[23:30:59] ../kernel/x86_64/sbgemm_kernel_16x4_cooperlake.c:102:47: error: ‘__m512bh’ undeclared (first use in this function)
[23:30:59]  #define FMA(a, b, r) r = _mm512_dpbf16_ps(r, (__m512bh)a, (__m512bh)b)
[23:30:59]                                                ^
[23:30:59] ../kernel/x86_64/sbgemm_kernel_16x4_cooperlake.c:105:2: note: in expansion of macro ‘FMA’
[23:30:59]   FMA(A_lo_##A, B_lo, result_00_##A##Bx##By); \
[23:30:59]   ^~~
[23:30:59] ../kernel/x86_64/sbgemm_kernel_16x4_cooperlake.c:214:47: note: in expansion of macro ‘MATMUL_4X’
[23:30:59]      BROADCAST_B_PAIR(0, 0); PREFETCH_B(0, 0); MATMUL_4X(0, 0, 0);
[23:30:59]                                                ^~~~~~~~~
[23:30:59] ../kernel/x86_64/sbgemm_kernel_16x4_cooperlake.c:102:47: note: each undeclared identifier is reported only once for each function it appears in
[23:30:59]  #define FMA(a, b, r) r = _mm512_dpbf16_ps(r, (__m512bh)a, (__m512bh)b)
[23:30:59]                                                ^
[23:30:59] ../kernel/x86_64/sbgemm_kernel_16x4_cooperlake.c:105:2: note: in expansion of macro ‘FMA’
[23:30:59]   FMA(A_lo_##A, B_lo, result_00_##A##Bx##By); \
[23:30:59]   ^~~
[23:30:59] ../kernel/x86_64/sbgemm_kernel_16x4_cooperlake.c:214:47: note: in expansion of macro ‘MATMUL_4X’
[23:30:59]      BROADCAST_B_PAIR(0, 0); PREFETCH_B(0, 0); MATMUL_4X(0, 0, 0);
[23:30:59]                                                ^~~~~~~~~
[23:30:59] ../kernel/x86_64/sbgemm_kernel_16x4_cooperlake.c:105:6: error: expected ‘)’ before ‘A_lo_0’
[23:30:59]   FMA(A_lo_##A, B_lo, result_00_##A##Bx##By); \
[23:30:59]       ^
[23:30:59] ../kernel/x86_64/sbgemm_kernel_16x4_cooperlake.c:102:56: note: in definition of macro ‘FMA’
[23:30:59]  #define FMA(a, b, r) r = _mm512_dpbf16_ps(r, (__m512bh)a, (__m512bh)b)
[23:30:59]                                                         ^
[23:30:59] ../kernel/x86_64/sbgemm_kernel_16x4_cooperlake.c:214:47: note: in expansion of macro ‘MATMUL_4X’
[23:30:59]      BROADCAST_B_PAIR(0, 0); PREFETCH_B(0, 0); MATMUL_4X(0, 0, 0);
[23:30:59]                                                ^~~~~~~~~
[23:30:59] ../kernel/x86_64/sbgemm_kernel_16x4_cooperlake.c:106:6: error: expected ‘)’ before ‘A_hi_0’
[23:30:59]   FMA(A_hi_##A, B_lo, result_01_##A##Bx##By); \
[23:30:59]       ^

The compile command used for this file was

[23:30:59] cc -O2 -DSMALL_MATRIX_OPT -DMAX_STACK_ALLOC=2048 -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DDYNAMIC_ARCH -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=512 -DMAX_PARALLEL_NUMBER=1 -DBUILD_BFLOAT16 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.26\" -msse3 -mssse3 -msse4.1 -mavx -mavx2 -march=skylake-avx512 -mavx2 -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME=sbgemm_kernel_COOPERLAKE -DASMFNAME=sbgemm_kernel_COOPERLAKE_ -DNAME=sbgemm_kernel_COOPERLAKE_ -DCNAME=sbgemm_kernel_COOPERLAKE -DCHAR_NAME=\"sbgemm_kernel_COOPERLAKE_\" -DCHAR_CNAME=\"sbgemm_kernel_COOPERLAKE\" -DNO_AFFINITY -DTS=_COOPERLAKE -I.. -DBUILD_KERNEL -DTABLE_NAME=gotoblas_COOPERLAKE -march=skylake-avx512 -mavx512f -DBFLOAT16 -UDOUBLE  -UCOMPLEX -c -DBFLOAT16 -UDOUBLE -UCOMPLEX ../kernel/x86_64/sbgemm_kernel_16x4_cooperlake.c -o sbgemm_kernel_COOPERLAKE.o

The GCC flag detection logic is working, because GCC 6 doesn't support -march=cooperlake, so it falls back to -march=skylake-avx512 which is supported. However, the optimized BFloat16 kernels are still being compiled for Cooperlake even though the AVX512bf extension is not available.

Ideally, we would like this to fall back to compiling the generic BFloat16 kernels if the AVX512BF extensions aren't available, because that way the API for the library is constant across all the library versions we build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant