You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently I submitted a PR to the PyTorch Repo for a vectorized tanh implementation for single precision. The implementation is a vectorized version of cephes math library's single precisiontanhf function. In PyTorch setting the implementation seemed faster than Sleef_tanhf8_u10 (I have posted some benchmark numbers in the PR here).
Are there any Sleef benchmarks that I can run to compare the implementation? In case it is faster are you open to a PR? Thanks!
The text was updated successfully, but these errors were encountered:
Hello @vedanuj,
Thank you for considering contribution. I'm open to a PR if your implementation is good enough. However, I cannot confirm if your implementation is good enough to adopt. Please consider checking the following points.
Is it an alternative to Sleef_tanhf8_u10? If so, please make sure that it's error is less than 1 ULP. It seems that you checked the correctness of your subroutine using a utility included in PyTorch, and it only took less than 1 second to check? That's not enough to check if the maximum error is less than the specified number. Please use tester2 included in libm-tester directory. Of course, you can use your own utility to check the maximum error.
Don't you have a double-precision implementation?
You also need to write the code using helper functions, like other functions in SLEEF.
I am now trying to implement 3.5-ULP versions of hyperbolic functions.
We (openjdk) face the similar issues in this openjdk/jdk#18605 when using vector functions in sleef, we use inline header generated by sleef with -DSLEEF_BUILD_INLINE_HEADERS=ON flag.
For performance regression data, please check the tests: Float128Vector.TANH, Float64Vector.TANH, Double128Vector.TANH, other tests look fine.
BTW, previously we fixed (#537, #536), but it did not resolve the TANH issue.
Recently I submitted a PR to the PyTorch Repo for a vectorized
tanh
implementation for single precision. The implementation is a vectorized version of cephes math library's single precisiontanhf
function. In PyTorch setting the implementation seemed faster thanSleef_tanhf8_u10
(I have posted some benchmark numbers in the PR here).Are there any Sleef benchmarks that I can run to compare the implementation? In case it is faster are you open to a PR? Thanks!
The text was updated successfully, but these errors were encountered: