-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial Support for the RISC-V Vector Extension in ARM NEON #1130
Conversation
* gh-actions macos: skip coverage report
fix : ci.yml feat : modify types.h for risc-v vector extension feat : modify simde utilities for rvv fix : type.h copyright fix : ci files name feat : modify load & store for risc-v v extension feat : modify load & store for risc-v vector fix : ci file fix : reinterpret (due to rvv mem pollution) feat : add and mul neon to rvv fix : copyright feat : modify ci.yml feat : remove TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very exciting! Has any testing been done on real RISCV64 RVV 1.0 hardware?
simde/arm/neon/ld1.h
Outdated
simde_memcpy(&r_, ptr, sizeof(r_)); | ||
#if defined(SIMDE_RISCV_V_NATIVE) && SIMDE_ARCH_RISCV_ZVFH | ||
r_.sv64 = __riscv_vle16_v_f16m1((_Float16 *)ptr , 4); | ||
#else | ||
simde_memcpy(&r_, ptr, 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the sizeof
version stay in the non-RISCV_V
branches?
Is that not working for the native RVV types a compiler bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sizeof can remain in the non-RVV branch. I will modify them.
Update : I've already reverted sizeof in SIMDe implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no bugs when compiling RVV types. However, the size of simde_xxx_private unions in type.h may not be as expected. The size of these unions can vary due to the differing lengths of RVV vector machines.
To be more specific, the size of simde_uint16x4_private will be 64 bits on vector machines other than RVV. However, for RVV with a VLEN (vector length) of 512, the union size of simde_uint16x4_private may expand to 512 bits, due to the limitations of types in RVV.
For testing, we have only tested the code using QEMU and the Spike simulator without real RISC-V RVV 1.0 hardware. |
Amazing work!
(this was run with 256 iterations and generated a 1440x1080 image) This is a 3.1x speedup, and close to the hand-optimized
Edit: |
Thanks @camel-cdr ! Can you run all the SIMDe tests from this PR on the k230? |
I'm currently working on that, however I run into problems with the glibc version on the k230. I used a freestanding build for the benchmark. |
I couldn't figure out how to get the glibc versions to align. |
#elif defined(SIMDE_RISCV_V_NATIVE) && defined(__riscv_v_fixed_vlen) | ||
//FIXME : SIMDE_NATURAL_VECTOR_SIZE == __riscv_v_fixed_vlen | ||
#define SIMDE_NATURAL_VECTOR_SIZE (128) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need fixing before merging? If not, then lets make an issue
What's the plan for GCC support? What about portable binaries, when will we not have to specify |
Sure ! |
Creating portable binaries for RVV (RISC-V Vector Extension) is not feasible, as explained in the discussion at https://news.ycombinator.com/item?id=37706070. To summarize, the vector size in RVV is determined at compile time, making it impossible to create binaries that can be ported seamlessly between RVV machines with different vector lengths. |
Would a binary for a smaller vector size work on a CPU with a larger vector size? Maybe a future RISC-V profile will mandate a larger vector size. I guess for Debian and others that want to maximize the performance of SIMDe using apps, we will have to compile multiple times based on the vector widths that are commercially available. Which is what we already do to support the various x86-64 SIMD intrinsics (https://wiki.debian.org/SIMDEverywhere and https://packages.debian.org/source/testing/subarch-select) |
@eric900115 For neon RVV codegen can be 100% portable. If we require the standard V extension (VLEN>=128 and ELEN=64), then we can use LMUL=1 on all implementations, because even for e.g. LMUL=512 a single vector registed does alteast contain 128 bits. We just need to #include <riscv_vector.h>
#include <stddef.h>
#include <stdint.h>
typedef struct { uint8_t arr[16]; } V128;
static
V128 vadd8(V128 a, V128 b)
{
vuint8m1_t A = __riscv_vle8_v_u8m1((void*)&a,16);
vuint8m1_t B = __riscv_vle8_v_u8m1((void*)&b,16);
vuint8m1_t C = __riscv_vadd_vv_u8m1(A, B, 16);
V128 c;
__riscv_vse8_v_u8m1((void*)&c, C, 16);
return c;
}
V128 test1(V128 a, V128 b, V128 c)
{
return vadd8(vadd8(a, b), vadd8(c, c));
}
V128 test2(V128 a, V128 b, V128 c)
{
vuint8m1_t A = __riscv_vle8_v_u8m1((void*)&a,16);
vuint8m1_t B = __riscv_vle8_v_u8m1((void*)&b,16);
vuint8m1_t C = __riscv_vle8_v_u8m1((void*)&c,16);
V128 r;
__riscv_vse8_v_u8m1((void*)&r, __riscv_vadd_vv_u8m1(__riscv_vadd_vv_u8m1(A, B, 16), __riscv_vadd_vv_u8m1(C, C, 16), 16), 16);
return r;
} |
Here's my
|
Thanks it worked. I didn't know about the debian image, and was using the k230_sdk thingy. Running
|
@camel-cdr Thanks! Yeah, I'm now also seeing those errors. I wonder why the qemu setup in this PR isn't reproducing them? I hope it isn't a hardware error! :-) |
Also, hello @eric900115 and @camel-cdr from the Debian Med Sprint in Berlin. Maybe you can join us in person next year? https://wiki.debian.org/Sprints/2023/DebianMed2024 |
Sounds interesting, Berlin is only 2-3 hours away from me. But I'm not really involved with Debian (except for running it). Btw, do you know how Debian deals with compiler bugs? I just ran into an gcc-13.2.0 codegen bug, that causes a valid program to not work |
I would personally respond positively to a
Anyone is welcome! We appreciate the user perspective! |
I am also wondering. I'll try to use qemu with same configuration for testing (testing with thread-c906 CPU). |
@camel-cdr Do you also get failures on the k230 with the current I'm seeing failures in
So I guess there are some clang and/or CPU errors .. ? |
@mr-c Yes, I get similar errors when testing
I'm somewhat inclined to believe it's a clang miss-compilation, because I had a gcc-13.2 miss-compilation yesterday, I suppose we need to investigate this somehow. |
Hey @camel-cdr ; in #1141 I fixed some of the NEON abs functions. Maybe you have time to re-run the tests? |
@mr-c Here we go, looks like the abs errors are gone, great work.
|
I have modified mul_lane and mulx_lane. Hope the error in fms_lane, fma_lane, mul_lane, and mulx_lane will be eliminated. |
Hello @eric900115, this is really good stuff. Is there anything you might need help with? |
Thanks @camel-cdr ; can you retest the latest? |
@mr-c the errors are still there, but the values are different now:
|
@camel-cdr are those errors from the |
@mr-c it was the native tests, I ran it via: |
Hi! Yes, we excluded BF16 and cryptography for conversion. For the conversion from NEON to RVV, if the performance (instruction counts) of using single or multiple RVV intrinsics is better than automatic vectorization, then we use RVV intrinsics for implementation. Otherwise, we use loop automatic vectorization from SIMDe. |
Thank you @eric900115 ! Now that SIMDe 0.8.0 is released we can focus the next development cycle on RVV 1.0 implementations. |
Hi everyone,
This is Eric from National Tsing Hua University (NTHU) pllab. This PR includes the initialization of the conversion of Neon to RISC-V Vector Extension (RVV) for SIMDe.
NTHU pllab and Andes Technology have collaborated to convert NEON intrinsics to the RISC-V Vector Extension, and we have converted all NEON intrinsics to RVV intrinsics. This PR marks the beginning of our work. We will soon upstream all of our work.
We made a few changes in the SIMDe repo to suit our needs:
type.h
to enable RVV types.memcpy
due to memory pollution issues in our implementation.add.h
andmul.h
.We have included clang-qemu-rvv testing for the following RISC-V V Extension architectures, both with and without ZVFH enabled:
vlen = 128 & elen = 64
vlen = 256 & elen = 64
vlen = 512 & elen = 64
To compile SIMDe with support for the conversion from NEON to the RISC-V Vector Extension, please use Clang-17 and include the flag
-mrvv-vector-bits=<vector_length_of_vector_machine>
during compilation. Replace <vector_length_of_vector_machine> with the actual vector length of RISC-V vector machine.