Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make findall faster for AbstractArrays #37177

Merged
merged 1 commit into from
Dec 18, 2020
Merged

Make findall faster for AbstractArrays #37177

merged 1 commit into from
Dec 18, 2020

Conversation

nalimilan
Copy link
Member

The findall fallback is quite slow when predicate is a small function compared with generating a logical index using broadcast and calling findall on it to compute integer indices. The gain is most visible when predicate is true for a large proportion of entries, but it's there even when all of them are false.
The drawback of this approach is that it requires allocating a vector of length(a)/8 bytes whatever the number of returned indices.

Some benchmarks using > and ==, for which the impact of using broacast should be the most visible thanks to SIMD. Note the timings difference when changing the proportion of true entries.

using BenchmarkTools

findall2(testf::Function, A::AbstractArray) = findall([testf(x)::Bool for x in A])
findall3(testf::Function, A::AbstractArray) = findall(testf.(A))

x = rand(10_000_000);

# Current state
julia> @btime findall(>(0.5), x);
  148.785 ms (24 allocations: 65.00 MiB)

julia> @btime findall(>(0.8), x);
  55.894 ms (22 allocations: 17.00 MiB)

julia> @btime findall(>(0.95), x);
  21.648 ms (20 allocations: 5.00 MiB)

julia> @btime findall(==(0.5), x);
  18.071 ms (1 allocation: 80 bytes)

# Using a comprehension
julia> @btime findall2(>(0.5), x);
  87.761 ms (5 allocations: 47.68 MiB)

julia> @btime findall2(>(0.8), x);
  44.411 ms (5 allocations: 24.78 MiB)

julia> @btime findall2(>(0.95), x);
  28.501 ms (5 allocations: 13.34 MiB)

julia> @btime findall2(==(0.5), x);
  23.160 ms (4 allocations: 9.54 MiB)

# Using broadcast (this PR)
julia> @btime findall3(>(0.5), x);
  13.709 ms (8 allocations: 39.34 MiB)

julia> @btime findall3(>(0.8), x);
  10.507 ms (8 allocations: 16.44 MiB)

julia> @btime findall3(>(0.95), x);
  9.702 ms (8 allocations: 5.00 MiB)

julia> @btime findall3(==(0.5), x);
  7.945 ms (7 allocations: 1.20 MiB)

The `findall` fallback is quite slow when predicate is a small function
compared with generating a logical index using `broadcast` and calling `findall`
on it to compute integer indices. The gain is most visible when predicate is true
for a large proportion of entries, but it's there even when all of them are false.
The drawback of this approach is that it requires allocating a vector of `length(a)/8`
bytes whatever the number of returned indices.
@@ -2404,7 +2408,8 @@ function findall(pred::Fix2{typeof(in),<:Union{Array{<:Real},Real}}, x::Array{<:
end
# issorted fails for some element types so the method above has to be restricted
# to element with isless/< defined.
findall(pred::Fix2{typeof(in)}, x::Union{AbstractArray, Tuple}) = _findin(x, pred.x)
findall(pred::Fix2{typeof(in)}, x::AbstractArray) = _findin(x, pred.x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's required to avoid an ambiguity, since ::Fix2 is more specific than ::Function, but ::AbstractArray is more specific than ::Union{AbstractArray, Tuple}.

@musm
Copy link
Contributor

musm commented Dec 15, 2020

This looks good to me.
There's only one scenario where the only benefit of the current method is fewer allocations at the expense of a 2x slow-down in speed. That sounds like a good tradeoff to me.

@mbauman @timholy want to also sign off / review ?

@mbauman
Copy link
Member

mbauman commented Dec 15, 2020

This is rather surprising to me — I suppose it's entirely due to the ability to sum logical arrays and @inbounds over a pre-determined size array. I appreciate that this is better performance in many cases, but I dislike how this is harder to opt-out of than it is to opt-into. I suppose, though, that — by definition — you're doing vectorized allocat-y Julia by using findall... and if this were truly chasing peak allocation-free performance, you'd do the iterative find loop.

So with that rationale, I'll give this a ✅. I'll give it a bit more time, but barring further comments let's merge it tomorrow.

@musm musm added arrays [a, r, r, a, y, s] search & find The find* family of functions performance Must go faster labels Dec 16, 2020
@mbauman mbauman merged commit d20ca48 into master Dec 18, 2020
@mbauman mbauman deleted the nl/findall branch December 18, 2020 12:28
@nalimilan
Copy link
Member Author

Thanks!

@garrison
Copy link
Member

This change broke UniqueVectors.jl. My current plan is to issue a new release to fix it, but I am wondering: could there be other fallout throughout the package ecosystem? Seems somewhat unlikely, but perhaps worth considering/checking.

@nalimilan
Copy link
Member Author

Usually before releasing a new version all package tests are run against it to detect any problems in advance, so I guess we'll find out at that point.

@mbauman
Copy link
Member

mbauman commented Apr 21, 2021

Would we avoid the ambiguity if it's findall(testf::Function, A::Union{AbstractArray, Tuple})? Broadcasting for tuples is likely faster too.

While we try to avoid ambiguity changes like this, it's really hard to avoid and is easy to fix.

@garrison
Copy link
Member

Would we avoid the ambiguity if it's findall(testf::Function, A::Union{AbstractArray, Tuple})?

Do you mean if we were to replace the signature findall(testf::Function, A::AbstractArray) with findall(testf::Function, A::Union{AbstractArray, Tuple})? The method signature in UniqueVectors that is now ambiguous (following the change in this pull request) is findall(p::Base.Fix2{typeof(in),<:AbstractUniqueVector}, a::Union{Tuple, AbstractArray}). I believe that even if the change you are suggesting is made, the UniqueVector method would still be ambiguous with findall(pred::Fix2{typeof(in)}, x::AbstractArray) and findall(pred::Fix2{typeof(in)}, x::Tuple).

An alternative approach (and one which you might actually be implying) is to replace the three signatures of findall introduced in this PR with a single signature: the one you mentioned, findall(testf::Function, A::Union{AbstractArray, Tuple}). Then this method could e.g. call an inner _findall method, and the dispatching could be performed one level deeper. I think this would result in unambiguous methods, but haven't considered it carefully or tested it.

While we try to avoid ambiguity changes like this, it's really hard to avoid and is easy to fix.

Do you mean to fix it by adjusting julia before the release or to fix it in packages? I am on board either way. If there is no other fallout in the package ecosystem, then the above solution is almost certainly overkill -- the UniqueVectors package is probably the right place to fix it.

@mbauman
Copy link
Member

mbauman commented Apr 21, 2021

Yes, I meant the latter suggestion on both counts. Sorry for the brevity!

ElOceanografo pushed a commit to ElOceanografo/julia that referenced this pull request May 4, 2021
The `findall` fallback is quite slow when predicate is a small function
compared with generating a logical index using `broadcast` and calling `findall`
on it to compute integer indices. The gain is most visible when predicate is true
for a large proportion of entries, but it's there even when all of them are false.
The drawback of this approach is that it requires allocating a vector of `length(a)/8`
bytes whatever the number of returned indices.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] performance Must go faster search & find The find* family of functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants