Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up Ryu.pow5 #46764

Merged
merged 1 commit into from
Oct 6, 2022
Merged

speed up Ryu.pow5 #46764

merged 1 commit into from
Oct 6, 2022

Conversation

oscardssmith
Copy link
Member

The current algorithm does up to p divisions, while this algorithm does only 1. I've assigned @quinnj to review since I'm not 100% sure that there isn't a very good reason that this function isn't written like this.

@oscardssmith oscardssmith requested a review from quinnj September 14, 2022 16:28
@Seelengrab
Copy link
Contributor

I'm not 100% sure that there isn't a very good reason that this function isn't written like this.

My guess would be overflow protection for sufficiently large p? Does wraparound prevent any shenanigans that could happen due to that?

Copy link
Member

@quinnj quinnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this was a straight port of the code in this repo, so there are probably ways to simplify/clean up (I think @simonbyrne did some of that initially when reviewing).

@oscardssmith oscardssmith added the performance Must go faster label Sep 15, 2022
@oscardssmith
Copy link
Member Author

Do you know if this needs an overflow check?

@quinnj
Copy link
Member

quinnj commented Sep 15, 2022

I don't off the top of my head

@KristofferC
Copy link
Member

Maybe run a PkgEval and merge if it looks ok?

@oscardssmith
Copy link
Member Author

It feels kind of silly to use pkgeval for something this small, but I guess we might as well.

@KristofferC
Copy link
Member

Or just merge, tests pass after all and I guess we will find it in release PkgEval if it is too bad

@quinnj quinnj merged commit ce04b75 into master Oct 6, 2022
@quinnj quinnj deleted the oscardssmith-faster-ryu-pow5 branch October 6, 2022 06:22
@quinnj
Copy link
Member

quinnj commented Oct 6, 2022

#yolo

@StefanKarpinski
Copy link
Member

Felt I should look at this since printing numbers incorrectly would be bad. This use of pow5 seems safe:

if q <= qinvbound(T)
if ((v % UInt32) - 5 * div(v, 5)) == 0
b_allzero = pow5(v, q)
elseif mf_iseven
a_allzero = pow5(u, q)
else
c -= pow5(w, q)
end
end

My reasoning is that 5^p is correct for p ≤ 27 and here we have that q ≤ qinvbound(T) defined here:

julia/base/ryu/utils.jl

Lines 17 to 19 in f71b839

qinvbound(::Type{Float16}) = 4
qinvbound(::Type{Float32}) = 9
qinvbound(::Type{Float64}) = 21

You can see that q can be at most 21, so it's safe. The other use of pow5 is less clear:

julia/base/ryu/exp.jl

Lines 152 to 159 in f71b839

rexp = precision - e
requiredTwos = -e2 - rexp
trailingZeros = requiredTwos <= 0 ||
(requiredTwos < 60 && pow2(m2, requiredTwos))
if rexp < 0
requiredFives = -rexp
trailingZeros = trailingZeros & pow5(m2, requiredFives)
end

Does anyone know what the range of possible values for precision and e are here? If e can be more than 27 greater than precision then we might have a problem with this change.

@StefanKarpinski
Copy link
Member

Bump: @oscardssmith, @quinnj do you know about the range of values that e can take here?

@oscardssmith
Copy link
Member Author

ah it appears that e can possibly be large if someone writes a decimal with a ton of digits.

@KristofferC
Copy link
Member

Do you have an example?

@LilithHafner
Copy link
Member

julia> using Printf

julia> @printf "%.8g" 4.645833859177319e63 # master
4.6458338e+63

julia> @printf "%.8g" 4.645833859177319e63 # 1.8
4.6458339e+63

There are two ways that pow5 can be wrong,

  1. returning false when m2 is divisible by big(5)^requiredFives
  2. returning true when m2 is not divisible by big(5)^requiredFives
    In the first case, for pow5 to be wrong, big(5)^requiredFives must be greater than typemax(Int), but m2 is always less than 1<<53, and a small number is not divisible by a large number. Thus m2 will never be divisible by the true big(5)^requiredFives when there is overflow and case 1 is okay.

For the second case, we need to find a requiredFives such that pow5 may wrongly return true. That is, find a requiredFives such that there exists m2 where m2 % 5^requiredFives == 0 For this to be the case, either m2 must be 0 (handled much earlier as a special case) or abs(5^requiredFives) <= m2 < 1<<53. Searching exhaustively from the cases that have overflow, findfirst(requiredFives -> abs(5^requiredFives) < 1<<53, 28:10^5)+27 == 55. Note that 5^55 < 0. If, on the other hand, we used unsinged(5)^requiredFives, then we'd have findfirst(requiredFives -> unsigned(5)^requiredFives < 1<<53, 28:10^5)+27 == 1048 and IIUC requiredFives is at most log10(floatmax(Float64)) == 308.25471555991675.

So using unsigned(5) should fix this and I benchmark it as comparable to signed exponentiation.

LilithHafner added a commit that referenced this pull request Nov 26, 2022
KristofferC pushed a commit that referenced this pull request Nov 28, 2022
Fixup for #46764

(cherry picked from commit 02aa0b0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants