Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can rustc support to call llvm intrinisc API which contains type like llvm_v4i1_ty? #81552

Open
kangshan1157 opened this issue Jan 30, 2021 · 10 comments
Labels
A-intrinsics Area: Intrinsics A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-feature-request Category: A feature request, i.e: not implemented / a PR. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@kangshan1157
Copy link

kangshan1157 commented Jan 30, 2021

During the intrinsic APIs implementation in stdarch, I find some llvm intrinsic APIs may contain types like llvm_v4i1_ty (4 bits), llvm_v2i1_ty (2 bits) for which I fail to find the corresponding types in Rust. What I learned Rust's type has 8 bits at least.
I'm a beginner of Rust and I have no idea about whether rustc can support to call a llvm intrinisc API which contains type like llvm_v4i1_ty, llvm_v2i1_ty ?

@nagisa
Copy link
Member

nagisa commented Jan 30, 2021

You would have to add a rust intrinsic that wraps and converts the arguments appropriately to make it work. There is no way at the rust source level to obtain these LLVM types.

@kangshan1157
Copy link
Author

kangshan1157 commented Jan 30, 2021

So I have to change the rust compiler code to achieve the purpose, right?
I'm not familiar with the rust compiler code, could you please guide me the point where I can start?

@nagisa
Copy link
Member

nagisa commented Jan 30, 2021

So I have to change the rust compiler code to achieve the purpose, right?

Sure, but do discuss this with… stdarch (I think that's what you meant?)… folks first. Maybe they have alternative ideas in mind.

@kangshan1157
Copy link
Author

Yes, I already submitted an issue in stdarch github. (rust-lang/stdarch#989) And they told me this should be supported by the compiler.

@camelid camelid added A-intrinsics Area: Intrinsics A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-feature-request Category: A feature request, i.e: not implemented / a PR. labels Feb 1, 2021
@Amanieu
Copy link
Member

Amanieu commented Feb 10, 2021

There are thousands of AVX512 intrinsics, some of which use i1 vectors for masks. It isn't practical to implement all of them directly in rustc.

I would like to have something this:

// This is passed in function calls as <i1 x 4> in LLVM IR.
#[repr(llvm_simd_bitmask(4))]
struct i1x4(u8);

This would be only for use internally within stdarch to bind to LLVM intrinsics that require it.

@Amanieu
Copy link
Member

Amanieu commented Feb 10, 2021

Alternatively maybe we could have some sort of attribute on the extern declaration to indicate that an integer argument should be transmuted into an i1 vector before being passed in.

@raphaelcohn
Copy link

Under the covers, some of these intrinsics are actually trying to return the value of the ZF or CF bits in EFLAGS (or RFLAGS, if 64-bit). For example, _ktestz_mask8_u8(). Looking on godbolt, it seems what's happening is a SETcc instruction is being issued in addition to the core AVX512 VPTESTB, so the intrinsic is actually returning (in our world) an u8 and so can be modelled as a Rust bool with an unsafe transmute. This seems to be going on even with -O3 level optimizations in clang. Given that the logical next step is to branch on the result of the intrinsic, this introduces an useless extra instruction for every branch if modelled as a bool.

However, at this point, no programming language other than assembler is really going to be apt, though. What's really needed is a way of saying 'this is a bit index into a flags register', that other things can muck up.

Making use of Intel's excruciatingly badly named AVX512 intrinsics is mind-bendingly hard unless you're a chess grandmaster who also gets everything he needs from Intel's incredibly terse documentation. Having to drop down and see what assembler is actually generated to understand what they do almost makes the point: they're not really fit for use. Add in the myriad of target features needed to be use them, and Joel Spolsky's matrix of pain was never more apt.

@minybot
Copy link

minybot commented Feb 20, 2021

Alternatively maybe we could have some sort of attribute on the extern declaration to indicate that an integer argument should be transmuted into an i1 vector before being passed in.

I read this one about how LLVM did mask use i1
https://llvm.org/devmtg/2017-03//assets/slides/avx512_mask_registers_code_generation_challenges_in_llvm.pdf

@minybot
Copy link

minybot commented Feb 21, 2021

Making use of Intel's excruciatingly badly named AVX512 intrinsics is mind-bendingly hard unless you're a chess grandmaster who also gets everything he needs from Intel's incredibly terse documentation. Having to drop down and see what assembler is actually generated to understand what they do almost makes the point: they're not really fit for use. Add in the myriad of target features needed to be use them, and Joel Spolsky's matrix of pain was never more apt.

For me, I used AVX512 intrinsic with Intel Compiler, it was very difficult. After using Rust with AVX512 intrinsic, it makes programming easier and faster because of its strong check. I could port my code to intrinsics very fast. If your algorithm fits SIMD, it can speed up 2-10x. Also, I guess most Intel cheap CPU in the future will support AVX512.

@raphaelcohn
Copy link

@minybot Oh, there's no doubt Rust is the better choice for modelling problems (particularly for domain modelling); I and my colleagues have been using it for nearly 5 years now, and we wouldn't even have attempted the large and complex problems we use it for if we'd had to use C or C++.

Rust's great strength for me is not that it's a easier to code something (or quicker), but that's possible to express intent, convey meaning and hide detail until it's needed without paying a performance penalty at all. It's a language that's us write code to be maintained, changed and improved for years to come.

@Noratrieb Noratrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-intrinsics Area: Intrinsics A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-feature-request Category: A feature request, i.e: not implemented / a PR. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants