-
Notifications
You must be signed in to change notification settings - Fork 12.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NEON] Wrong result of NEON intrinsic vld2q_dup_p16
with the -march=armv7-a
flag in CLANG-15.
#71763
Comments
@llvm/issue-subscribers-backend-arm Author: Yi-Yen Chung (yyctw)
According to the ARM documentation https://developer.arm.com/architectures/instruction-sets/intrinsics/vld2q_dup_p16, `vld2q_dup_p16` loads a 2-element structure from memory and replicates the structure to all the lanes of the two SIMD&FP registers.
The expected result should be as follows:
```
poly16_t a[2] = {1, 3};
poly16x8x2_t r = vld2q_dup_p16(a);
// The value of r {1, 1, 1, 1, 1, 1, 1, 1,
// 3, 3, 3, 3, 3, 3, 3, 3}
```
However, the result of `vld2q_dup_p8` with the flags `-march=armv7-a` and `-mfpu=neon` in CLANG-15 is as follows:
```
poly16_t a[2] = {1, 3};
poly16x8x2_t r = vld2q_dup_p16(a);
// The value of r {3, 3, 3, 3, 1, 1, 1, 1,
// 0, 0, 0, 0, 3, 3, 3, 3}
```
This issue also occurs in `vld2q_dup_p8`.
Reproduce problem: https://godbolt.org/z/6ETT3rjKo
Thank you for your reading. |
@yyctw: Please always try to test latest release (only supported) and main. |
Thank you for the reminder. This is the result after the change: https://godbolt.org/z/bP4d18YMh |
I think this is an example of it going wrong: https://godbolt.org/z/hevqGKfqj The VLD2DUPq16EvenPseudo/VLD2DUPq16OddPseudo pseudos don't look well defined, there is no register dependency between them. |
This ensures the odd/even pseudo instructions are allocated to the same register range. This fixes llvm#71763. (llvm#71763)
Fix proposed: #79287 |
This ensures the odd/even pseudo instructions are allocated to the same register range. This fixes llvm#71763. (llvm#71763)
) This ensures the odd/even pseudo instructions are allocated to the same register range. This fixes #71763
@yyctw This should now be fixed in llvm main, and should be included in the next release. Thank you for the bug report. |
According to the ARM documentation https://developer.arm.com/architectures/instruction-sets/intrinsics/vld2q_dup_p16,
vld2q_dup_p16
loads a 2-element structure from memory and replicates the structure to all the lanes of the two SIMD&FP registers.The expected result should be as follows:
However, the result of
vld2q_dup_p8
with the flags-march=armv7-a
and-mfpu=neon
in CLANG-15 is as follows:This issue also occurs in
vld2q_dup_p8
.Reproduce problem: https://godbolt.org/z/6ETT3rjKo
CLANG version:
Thank you for your reading.
The text was updated successfully, but these errors were encountered: