Skip to content

Commit

Permalink
intrinsics.fmuladdf{16,32,64,128}: expose llvm.fmuladd.* semantics
Browse files Browse the repository at this point in the history
Add intrinsics `fmuladd{f16,f32,f64,f128}`. This computes `(a * b) +
c`, to be fused if the code generator determines that (i) the target
instruction set has support for a fused operation, and (ii) that the
fused operation is more efficient than the equivalent, separate pair
of `mul` and `add` instructions.

https://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic

MIRI support is included for f32 and f64.

The codegen_cranelift uses the `fma` function from libc, which is a
correct implementation, but without the desired performance semantic. I
think this requires an update to cranelift to expose a suitable
instruction in its IR.

I have not tested with codegen_gcc, but it should behave the same
way (using `fma` from libc).
  • Loading branch information
jedbrown committed Oct 11, 2024
1 parent 2ca5753 commit 058e02c
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions src/intrinsic/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,9 @@ fn get_simple_intrinsic<'gcc, 'tcx>(
sym::log2f64 => "log2",
sym::fmaf32 => "fmaf",
sym::fmaf64 => "fma",
// FIXME: calling `fma` from libc without FMA target feature uses expensive sofware emulation
sym::fmuladdf32 => "fmaf", // TODO: use gcc intrinsic analogous to llvm.fmuladd.f32
sym::fmuladdf64 => "fma", // TODO: use gcc intrinsic analogous to llvm.fmuladd.f64
sym::fabsf32 => "fabsf",
sym::fabsf64 => "fabs",
sym::minnumf32 => "fminf",
Expand Down

0 comments on commit 058e02c

Please sign in to comment.