-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add fastmath for min and max #104
Conversation
Codecov ReportBase: 88.57% // Head: 91.90% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #104 +/- ##
==========================================
+ Coverage 88.57% 91.90% +3.33%
==========================================
Files 5 5
Lines 525 531 +6
==========================================
+ Hits 465 488 +23
+ Misses 60 43 -17
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
This is a somewhat "unreliable" implementation: It implements |
Yes, I was lazy :P. The intrinsic is minnum: SIMD.jl/src/LLVM_intrinsics.jl Line 241 in 705384c
We currently call that as a |
Don't let the better be the enemy of the good: If it works and if it improves the code, then thank you for your work. You could add a respective comment that what you suggest above is the way to go for the future, and that there is no particular reason to implement fastmath |
I think this approach should be better: julia> mymin(a, b) = @fastmath min(a, b)
mymin (generic function with 1 method)
julia> v = SIMD.Vec(1.0,2.0,3.0,4.0)
<4 x Float64>[1.0, 2.0, 3.0, 4.0]
julia> @code_llvm mymin(v,v)
; @ REPL[2]:1 within `mymin`
%3 = getelementptr inbounds [1 x <4 x double>], [1 x <4 x double>]* %1, i64 0, i64 0
%4 = getelementptr inbounds [1 x <4 x double>], [1 x <4 x double>]* %2, i64 0, i64 0
%5 = load <4 x double>, <4 x double>* %3, align 16
%6 = load <4 x double>, <4 x double>* %4, align 16
%res.i = call fast <4 x double> @llvm.minnum.v4f64(<4 x double> %5, <4 x double> %6)
%7 = getelementptr inbounds [1 x <4 x double>], [1 x <4 x double>]* %0, i64 0, i64 0
store <4 x double> %res.i, <4 x double>* %7, align 32
ret void
} |
I can confirm that this PR works 👍 Thanks for the quick turnaround! The assembly is now exactly what I'd hope it to be. |
Should fix #103
@Seelengrab, can you test this?