Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEON] Wrong result of NEON intrinsic vld2q_dup_p16 with the -march=armv7-a flag in CLANG-15. #71763

Closed
yyctw opened this issue Nov 9, 2023 · 6 comments · Fixed by #79287
Closed

Comments

@yyctw
Copy link

yyctw commented Nov 9, 2023

According to the ARM documentation https://developer.arm.com/architectures/instruction-sets/intrinsics/vld2q_dup_p16, vld2q_dup_p16 loads a 2-element structure from memory and replicates the structure to all the lanes of the two SIMD&FP registers.
The expected result should be as follows:

poly16_t a[2] = {1, 3};
poly16x8x2_t r = vld2q_dup_p16(a);
// The value of r {1, 1, 1, 1, 1, 1, 1, 1,
//                 3, 3, 3, 3, 3, 3, 3, 3}

However, the result of vld2q_dup_p8 with the flags -march=armv7-a and -mfpu=neon in CLANG-15 is as follows:

poly16_t a[2] = {1, 3};
poly16x8x2_t r = vld2q_dup_p16(a);
// The value of r {3, 3, 3, 3, 1, 1, 1, 1,
//                 0, 0, 0, 0, 3, 3, 3, 3}

This issue also occurs in vld2q_dup_p8.

Reproduce problem: https://godbolt.org/z/6ETT3rjKo
CLANG version:

Debian clang version 15.0.7
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

Thank you for your reading.

@llvmbot
Copy link
Member

llvmbot commented Nov 9, 2023

@llvm/issue-subscribers-backend-arm

Author: Yi-Yen Chung (yyctw)

According to the ARM documentation https://developer.arm.com/architectures/instruction-sets/intrinsics/vld2q_dup_p16, `vld2q_dup_p16` loads a 2-element structure from memory and replicates the structure to all the lanes of the two SIMD&FP registers. The expected result should be as follows: ``` poly16_t a[2] = {1, 3}; poly16x8x2_t r = vld2q_dup_p16(a); // The value of r {1, 1, 1, 1, 1, 1, 1, 1, // 3, 3, 3, 3, 3, 3, 3, 3} ``` However, the result of `vld2q_dup_p8` with the flags `-march=armv7-a` and `-mfpu=neon` in CLANG-15 is as follows: ``` poly16_t a[2] = {1, 3}; poly16x8x2_t r = vld2q_dup_p16(a); // The value of r {3, 3, 3, 3, 1, 1, 1, 1, // 0, 0, 0, 0, 3, 3, 3, 3} ``` This issue also occurs in `vld2q_dup_p8`.

Reproduce problem: https://godbolt.org/z/6ETT3rjKo
CLANG version:

Debian clang version 15.0.7
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

Thank you for your reading.

@EugeneZelenko
Copy link
Contributor

@yyctw: Please always try to test latest release (only supported) and main.

@yyctw
Copy link
Author

yyctw commented Nov 9, 2023

@yyctw: Please always try to test latest release (only supported) and main.

Thank you for the reminder. This is the result after the change: https://godbolt.org/z/bP4d18YMh

@davemgreen
Copy link
Collaborator

I think this is an example of it going wrong: https://godbolt.org/z/hevqGKfqj

The VLD2DUPq16EvenPseudo/VLD2DUPq16OddPseudo pseudos don't look well defined, there is no register dependency between them.

AlfieRichardsArm added a commit to AlfieRichardsArm/llvm-project that referenced this issue Jan 24, 2024
This ensures the odd/even pseudo instructions are allocated to the same register range.

This fixes llvm#71763.
(llvm#71763)
@AlfieRichardsArm
Copy link
Contributor

Fix proposed: #79287

AlfieRichardsArm added a commit to AlfieRichardsArm/llvm-project that referenced this issue Jan 26, 2024
This ensures the odd/even pseudo instructions are allocated to the same register range.

This fixes llvm#71763.
(llvm#71763)
AlfieRichardsArm added a commit that referenced this issue Jan 31, 2024
)

This ensures the odd/even pseudo instructions are allocated to the same
register range.

This fixes #71763
@AlfieRichardsArm
Copy link
Contributor

@yyctw This should now be fixed in llvm main, and should be included in the next release. Thank you for the bug report.

@EugeneZelenko EugeneZelenko removed the clang:headers Headers provided by Clang, e.g. for intrinsics label Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants