-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use sqrt intrinsic for floating point sqrt #5686
Comments
I'm running tests on a patch that uses intrinsics for |
@dbaupp: Nice! It would be awesome to have all of the useful LLVM intrinsics for integers/floats exposed. |
Yay! 5x faster: use core::float::sqrt;
fn main() {
let mut sum: float = 0.0;
for core::uint::range(0, 100000000) |i| {
sum += sqrt(i as float);
}
io::println(fmt!("%f", sum));
}
|
@dbaupp mentioned on reddit that using some of these clobbers the stack canary. I hadn't considered that before and we need to think about how these are implemented. How many of the intrinsics that we are already using are just trotting all over the stack red zone? Perhaps LLVM's intrinsics don't emit the proper stack growth prologues. |
(The very brief reddit thread, for future reference.) Do I continue working on this? (i.e. open a pull request, since I've got the code basically organised.) The (most) dangerous intrinsics seem to be |
@dbaupp I'd like to know what assembly they generate. Do you have a test case that demonstrates all of them? |
Achieves up to 5x speed up! However, the intrinsics seem to do bad things to the stack, especially sin, cos and exp (rust-lang#5686 has discussion). Also, add f{32,64,loat}::powi, and reorganise the delegation code so that functions have the #[inline(always)] annotation, and reduce the repetition of delegate!(..).
@huonw The necessary LLVM changes have been made. However, I didn't realize that intrinsics can clobber the stack. I guess we need to write wrappers with the |
@pcwalton Does #[abi = "rust-intrinsic"]
pub extern "rust-intrinsic" {
#[fixed_stack_segment]
pub fn sinf32(x: f32) -> f32;
} |
@huonw Extern functions automatically require fixed stack segments, if the |
…lton This implements the fixed_stack_segment for items with the rust-intrinsic abi, and then uses it to make f32 and f64 use intrinsics where appropriate, but without overflowing stacks and killing canaries (cf. #5686 and #5697). Hopefully. @pcwalton, the fixed_stack_segment implementation involved mirroring its implementation in `base.rs` in `trans_closure`, but without adding the `set_no_inline` (reasoning: that would defeat the purpose of intrinsics), which is possibly incorrect. I'm a little hazy about how the underlying structure works, so I've annotated the 4 that have caused problems so far, but there's no guarantee that the other intrinsics are entirely well-behaved. Anyway, it has good results (the following are just summing the result of each function for 1 up to 100 million): ``` $ ./intrinsics-perf.sh f32 func new old speedup sin 0.80 2.75 3.44 cos 0.80 2.76 3.45 sqrt 0.56 2.73 4.88 ln 1.01 2.94 2.91 log10 0.97 2.90 2.99 log2 1.01 2.95 2.92 exp 0.90 2.85 3.17 exp2 0.92 2.87 3.12 pow 6.95 8.57 1.23 geometric mean: 2.97 $ ./intrinsics-perf.sh f64 func new old speedup sin 12.08 14.06 1.16 cos 12.04 13.67 1.14 sqrt 0.49 2.73 5.57 ln 4.11 5.59 1.36 log10 5.09 6.54 1.28 log2 2.78 5.10 1.83 exp 2.00 3.97 1.99 exp2 1.71 3.71 2.17 pow 5.90 7.51 1.27 geometric mean: 1.72 ``` So about 3x faster on average for f32, and 1.7x for f64. This isn't exactly apples to apples though, since this patch also adds #[inline(always)] to all the function definitions too, which possibly gives a speedup. (fwiw, GitHub is showing 93c0888 after d9c54f8 (since I cherry-picked the latter from #5697), but git's order is the other way.)
Thanks @huonw! |
Reorder sections of release documentation r? @phansch [Rendered](https://github.com/flip1995/rust-clippy/blob/release_doc/doc/release.md) changelog: none
Replace all remaining occurrences of submodule with subtree r? @phansch I should have included this in rust-lang#5686, but forgot about it, so here we go again. changelog: none
sqrt is still using libc
The text was updated successfully, but these errors were encountered: