-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert optimisations to findall(f, ::AbstractArray{Bool})
#49831
base: master
Are you sure you want to change the base?
Conversation
In commit 4c4c94f, findall(f, A::AbstractArray{Bool}) was optimised by using a technique where A was traversed twice: Once to count the number of true elements and once to fill in the resulting vector. However, this could cause problems for arbitrary functions f: For slow f, the approach is ~2x slower. For impure f, f being called twice could cause side effects and strange issues (see issue JuliaLang#46425) With this commit, the optimised version is only dispatched to when f is ! or identity.
let
Which approach is faster I think is very conditional on all of And it leaves further optimization on the table when
Where Also, the case of an impure predicate is probably a good candidate to add a unit test for |
base/array.jl
Outdated
end | ||
|
||
findall(f::Function, A::AbstractArray{Bool}) = _findall(f, A) | ||
findall(f::Union{typeof(!), typeof(identity)}, A::AbstractArray{Bool}) = _findall(f, A) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aviatesk this seems like another case where ideally we should be dispatching on the Core.Compiler.infer_effect
result for f(A)
to the best choice of algorithm.
I marked this for triage commentary and the main takeaways are that everyone feels that the existing optimization is pretty icky and would love to see some benchmarks about whether it's worth it or not to ever do, especially since our vector growing code has gotten a fair amount of work in the past couple of years and might be good enough now to make this no longer worthwhile. We could also add a |
I do think it's compelling that we should not evaluate predicates with side effects multiple times, we should restrict the optimization to predicates without side effects. Now that we can infer effects it seems better to just avoid multiple evaluation in cases where getindex or the predicate have side effects. TBD what effect set is allowable. Another thing that came up on the triage call is that it could be even more efficient to take a random sample of values in the array to evaluate and use that estimate a sizehint and then do the pushing implementation. |
So my overall proposal is to add the sizehint keyword argument, which
If the user wants to provide a size hint when getindex or the predicate aren't pure enough, then they can pass it manually and we could expose the estimator utility function. |
In theory, a random sample based preallocation could be optimal, but I have yet to see that approach outperform the two-pass approach which preallocates. I propose adding the guarantee "this won't call f/getindex more than once per element", gating the preallocation optimization based on inferred effects, and keeping all those optimizations internal for now. I have a hard time picturing a use case where it is appropriate for a user to dive deep enough into findall optimizations that it is warranted to provide a |
It is also probably worth keeping in mind the fact that high performance use cases will likely want to use |
TL;DR: We should undo all my "optimizations" and revert back to #37177 ImplementationsBased on triage's comments and a discussion with @LilithHafner , I've now done some benchmarking on various implementations. It's a little tricky because the speed depends on several factors, including
I've tested 4 different implementations:
Note that if we take away the "ickiness" of the existing optimization, then the existing implementations are either 1. or 4. depending on dispatch. Benchmarks
Benchmarks:
Benchmarks (Click me!)1 length = 0 Push 0.171 1 length = 32 Push 0.416 1 length = 1024 Push 1.253 1 length = 32224 Push 1.541 1 length = 1000000 Push 1.571 2 length = 0 Push 1.124 2 length = 32 Push 1.699 2 length = 1024 Push 2.436 2 length = 32224 Push 1.853 2 length = 1000000 Push 1.81 3 length = 0 Push 0.426 3 length = 32 Push 0.567 3 length = 1024 Push 1.393 3 length = 32224 Push 7.043 3 length = 1000000 Push 6.217 4 length = 0 Push 0.385 4 length = 32 Push 1.478 4 length = 1024 Push 1.292 4 length = 32224 Push 2.051 4 length = 1000000 Push 3.17 5 length = 0 Push 0.406 5 length = 32 Push 0.514 5 length = 1024 Push 1.038 5 length = 32224 Push 2.422 5 length = 1000000 Push 2.252 6 length = 0 Push 0.427 6 length = 32 Push 0.474 6 length = 1024 Push 1.302 6 length = 32224 Push 1.854 6 length = 1000000 Push 3.328 ConclusionFrom my benchmarks, the conclusion is that 3. is the best overall solution, i.e. we simply revert everything back to #37177. We will regress in the sense that #42187 will re-appear, but this doesn't matter IMO. It's fine that Q&AWhat about an implementation just using
|
findall(f, ::AbstractArray{Bool})
cae8886
to
e29947c
Compare
In commit 4c4c94f, findall(f, A::AbstractArray{Bool}) was optimised by using a technique where A was traversed twice: Once to count the number of true elements and once to fill in the resulting vector.
However, this could cause problems for arbitrary functions f: For slow f, the approach is ~2x slower. For impure f, f being called twice could cause side effects and strange issues (see issue #46425)
With this commit, the optimised version is only dispatched to when f is ! or identity.
Note that this re-introduces the performance regression from #42187 by reverting some of the optimisation in #42202
Closes issue #46425