Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement #185

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ParadaCarleton
Copy link

Only sort the tail values of the array, rather than the whole thing.

Only sort the tail values of the array, rather than the whole thing.
@avehtari
Copy link
Collaborator

Is this keeping the largest value smaller than tail values in a correct position?

cutoff <- ord$x[min(tail_ids) - 1] # largest value smaller than tail values

@ParadaCarleton
Copy link
Author

Hopefully someday my brain will stop making off-by-one errors, but today is not that day.

@avehtari
Copy link
Collaborator

Maybe I need to rephrase. After your change, how do you determine the largest value that is smaller than the tail values? If you don't sort them, I think you should at least find the largest value among the non-tail values, e.g. with max() function.

@ParadaCarleton
Copy link
Author

Maybe I need to rephrase. After your change, how do you determine the largest value that is smaller than the tail values? If you don't sort them, I think you should at least find the largest value among the non-tail values, e.g. with max() function.

Doesn't the code determine that by sorting the n+1 largest values into position, where n is the tail length? I might be misunderstanding what R's partial sort does.

@avehtari
Copy link
Collaborator

Ah, I was too tired when checking this and got confused. I'm also confused as off-by-one commit didn't make any change to the code, only change is in json. So how did you fix that off-by-one error?

@ParadaCarleton
Copy link
Author

Ah, I was too tired when checking this and got confused. I'm also confused as off-by-one commit didn't make any change to the code, only change is in json. So how did you fix that off-by-one error?

Whoops, pushed the wrong change -- this should fix the off-by-one error. (I was accidentally only sorting the tail, rather than sorting from the last element not in the tail.)

@jgabry
Copy link
Member

jgabry commented Sep 20, 2021

Thanks @ParadaCarleton and sorry for not getting to this sooner! I just triggered all the continuous integration tests to run again so let's see if they pass.

And yeah, @avehtari I think the latest commit from Carlos fixes the off by one issue, although if you want to double check that would be great. The tests should also catch it since I think we have regression tests that detect any changes to the loo results (and presumably an off by one error would lead to some change in the results).

@jgabry
Copy link
Member

jgabry commented Sep 20, 2021

Actually some tests seem to be failing, so we'll need to look into that: https://github.com/stan-dev/loo/pull/185/checks?check_run_id=3656664234#step:11:150

@sethaxen
Copy link

Are there benchmarks showing that this is indeed a performance improvement in R? I tried something similar for PSIS.jl and found that in Julia at least it was always slower. e.g.

julia> using BenchmarkTools

julia> tail_length(reff, S) = min(cld(S, 5), ceil(Int, 3 * sqrt(S / reff)));

julia> function foo(logw)
           S = length(logw)
           M = tail_length(1, S)
           return sort(logw)[M:S]
       end;

julia> function bar(logw)
           S = length(logw)
           M = tail_length(1, S)
           return partialsort(logw, M:S)
       end;

julia> logw = randn(1000);

julia> foo(logw) == bar(logw)
true

julia> @btime $foo($logw);
  18.822 μs (2 allocations: 15.19 KiB)

julia> @btime $bar($logw);
  21.504 μs (1 allocation: 7.94 KiB)

@ParadaCarleton
Copy link
Author

ParadaCarleton commented Dec 11, 2021

Are there benchmarks showing that this is indeed a performance improvement in R? I tried something similar for PSIS.jl and found that in Julia at least it was always slower. e.g.

julia> using BenchmarkTools

julia> tail_length(reff, S) = min(cld(S, 5), ceil(Int, 3 * sqrt(S / reff)));

julia> function foo(logw)
           S = length(logw)
           M = tail_length(1, S)
           return sort(logw)[M:S]
       end;

julia> function bar(logw)
           S = length(logw)
           M = tail_length(1, S)
           return partialsort(logw, M:S)
       end;

julia> logw = randn(1000);

julia> foo(logw) == bar(logw)
true

julia> @btime $foo($logw);
  18.822 μs (2 allocations: 15.19 KiB)

julia> @btime $bar($logw);
  21.504 μs (1 allocation: 7.94 KiB)

I benchmarked this and got a major speedup in Julia when I switched to only sorting partially; I assume the difference is to do with Julia's sorting algorithms, which have some large problems with cache misses. Feel free to discuss this with me on Slack.

@ParadaCarleton
Copy link
Author

Are there benchmarks showing that this is indeed a performance improvement in R? I tried something similar for PSIS.jl and found that in Julia at least it was always slower. e.g.

julia> using BenchmarkTools

julia> tail_length(reff, S) = min(cld(S, 5), ceil(Int, 3 * sqrt(S / reff)));

julia> function foo(logw)
           S = length(logw)
           M = tail_length(1, S)
           return sort(logw)[M:S]
       end;

julia> function bar(logw)
           S = length(logw)
           M = tail_length(1, S)
           return partialsort(logw, M:S)
       end;

julia> logw = randn(1000);

julia> foo(logw) == bar(logw)
true

julia> @btime $foo($logw);
  18.822 μs (2 allocations: 15.19 KiB)

julia> @btime $bar($logw);
  21.504 μs (1 allocation: 7.94 KiB)

You're sorting from the tail length to the end of the array (accidentally sorting everything but the tail). The result is you're asking for almost the whole array to be sorted (~900 instead of ~100 elements), so the constant overhead of partialsort exceeds partialsort's better asymptotic performance. The cost is O(n+k log(k)) to sort k elements from a list of n -- because k=O(sqrt(n)), the cost is linear in the number of samples instead of O(n*log(n)). (The performance is still better for small samples, though, because k is at most 20% of the sample.)

julia> @benchmark partialsort($x, (1000-len):1000)
BenchmarkTools.Trial: 
  memory estimate:  8.03 KiB
  allocs estimate:  4
  --------------
  minimum time:     2.609 μs (0.00% GC)
  median time:      2.741 μs (0.00% GC)
  mean time:        3.107 μs (6.50% GC)
  maximum time:     177.498 μs (89.91% GC)
  --------------
  samples:          10000
  evals/sample:     9

julia> @benchmark sort($x)
BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     12.518 μs (0.00% GC)
  median time:      13.624 μs (0.00% GC)
  mean time:        14.270 μs (0.00% GC)
  maximum time:     213.956 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

@sethaxen
Copy link

You're sorting from the tail length to the end of the array (accidentally sorting everything but the tail).

Good catch! Yes, I'm seeing the same performance improvement you're seeing now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants