Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sum(i for i in Vector{Int}()) does not return 0 #27766

Closed
cossio opened this issue Jun 25, 2018 · 28 comments
Closed

sum(i for i in Vector{Int}()) does not return 0 #27766

cossio opened this issue Jun 25, 2018 · 28 comments

Comments

@cossio
Copy link
Contributor

cossio commented Jun 25, 2018

sum(i for i in Vector{Int}()) raises an error instead of returning 0. The sum of an empty list is zero (mathematically). Moreover in this case the return type (Int) is also clear. Therefore I don't see why this should not return 0 (Int). Of course the same applies to any number type.

See also: https://discourse.julialang.org/t/sum-i-for-i-in-vector-int-raises-error-instead-of-returning-0/1705/8

@stevengj
Copy link
Member

Related to #18695 and #18873 (generators don't have an eltype).

The sum of an empty list is the additive identity ("zero"), but the value of the additive identity may not be 0 in general. For example, if sum an empty list of 2×2 integer matrices, the result should be a 2×2 matrix of 0. But I agree that in this case, in principle, the correct type of zero could be inferred.

@cossio
Copy link
Contributor Author

cossio commented Jun 25, 2018

I guess it should return "zero" (of type T, if that is a matrix or whatever) whenever the type T can be inferred.

@martinholters
Copy link
Member

"For empty collections, it may throw an error or return zero(T), with T being the element type of the collection, depending on the state of the world." I don't think I like that kind of semantic. It's too likely to lure you into believing it works today just because it worked yesterday.

@StefanKarpinski
Copy link
Member

Since sum(Int[]) === 0 it would not be crazy for this to do the same.

@stevengj
Copy link
Member

stevengj commented Jun 25, 2018

The root problem here is

julia> eltype(i for i in Vector{Int}())
Any

Inference works in this particular case:

julia> g = (i for i in Vector{Int}())
Base.Generator{Array{Int64,1},getfield(Main, Symbol("##7#8"))}(getfield(Main, Symbol("##7#8"))(), Int64[])

julia> Base.return_types(g.f, Tuple{eltype(g.iter)})
1-element Array{Any,1}:
 Int64

but it may be tricky to use this information without making sum(::Generator) type-unstable in general.

@nalimilan
Copy link
Member

Indeed. See also discussion about element type of generators at https://discourse.julialang.org/t/can-eltype-deduce-the-element-type-of-a-generator/6429.

@StefanKarpinski
Copy link
Member

We do generally make a best effort for empty collections based on inferred return types though.

@martinholters
Copy link
Member

The problem here is that we'd need zero(Any) for the case where inference doesn't succeed in finding a tight bound.

@StefanKarpinski
Copy link
Member

Sure but why can't the error only happen in that case?

@martinholters
Copy link
Member

Because that would probably make people rely on inference.

Imaginary bug report:
"I have bar(n) = sum(foo(x) for 1:n), and bar(0) used to work, but since I updated my dependency Foo to v0.34, it throws an error. Interestingly, the error goes away if I comment out using Bar in my code"
Reply:
"That's by design. Your function foo is type-unstable and calls bork with a parameter that is inferred as Any. Now, bork returns an Int, which can be inferred and all is fine. But Foo (since v0.34) and Baralso define methods of bork. These, too, return an Int, but inference doesn't know which method is called and there are too many now to infer them all, so the return type is widened to Any."

That does not sound too attractive to me. (Also, figuring all that out probably cost someone a good amount of time.)

Making the element type of empty arrays depend on inference already was a measure of last resort. Deciding whether to throw an error or not depending on inference seems way too risky to me. Although I admit it will work most of the time, encouraging people to write code that works only most of the time is questionable.

@StefanKarpinski
Copy link
Member

That is true in every case were we use return_type to determine what to do in the empty collection case. Is the argument that this is worse because it's an error whereas choosing an element type doesn't necessarily cause an error—although it might if your code was previously inserting a value where a change to inference causes a tighter type to be inferred.

@martinholters
Copy link
Member

Yes. Chances are that the element type of an empty collection does not matter at all. Furthermore, the element type for the empty case may only be wider than in the non-empty case, so inserting an element should usually not be trap. An example where you may get a sudden error would be sum(foo.(x)) for empty x (or sum(collect(foo(i) for i in Int[])), if you try to workaround the issue discussed here).

So for the collect/broadcast reliance on return_type in the empty case, there is a high chance of a change in the inference to Any going unnoticed/only causing a performance penalty. For sum, that chance would be 0.

IMO, a much saner approach would be to let sum take a v0 (or init, ref #27711) parameter one can pass in situations like this: sum(i for i in Vector{Int}(), init=0).

@cossio
Copy link
Contributor Author

cossio commented Jun 26, 2018

Maybe sum can take a type parameter that does two things:

  1. ensures that all elements encountered are of this type (otherwise throws an error)
  2. returns zero of this type if the iterator is empty

@pkofod
Copy link
Contributor

pkofod commented Jun 26, 2018

I guess it should return "zero" (of type T, if that is a matrix or whatever) whenever the type T can be inferred.

Even if the type can be inferred, you may not be able to create an appropriate zero since array types don't carry their sizes with them. What is

out = sum(i for i in Vector{Matrix{Float64}())

? I mean, what is size(out)?

@StefanKarpinski
Copy link
Member

Not being able to conjure a zero of some type is a different issue than not knowing what type of zero one wants. That can be a failure of zero(T) instead of a failure of sum.

@benninkrs
Copy link

To follow up on the lines of thought hinted at by @martinholters and @StefanKarpinski: Perhaps the real problem is the lack of a generic representation of additive identity, which is a well-defined concept. Would it be crazy to have a singleton type to represent this concept in cases where the type is abstract? In relevant operations it could just promote to the appropriate concrete type.

@StefanKarpinski
Copy link
Member

We already have that: Boolfalse is a "strong zero" and true is a "strong one" in the sense that false * x == zero(x) and true * x == x. We could potentially return false in these cases and it wouldn't be too bad since convert(T, false) is defined to give you zero(T) in general.

@cossio
Copy link
Contributor Author

cossio commented Jun 27, 2018

The Bool has additional semantics not carried by zero (you can use the Bool in an if, but not the zero). Therefore returning false does not seem consistent.

@martinholters
Copy link
Member

We'd need a "strong zero" in the additive sense, and false + x == x is deprecated for arrays x and probably fails for unitful x.

@cossio
Copy link
Contributor Author

cossio commented Jun 27, 2018

The idea of a strong zero seems analogous of I, the UniformScaling representing an identity matrix of any size.

@cossio
Copy link
Contributor Author

cossio commented Dec 17, 2018

It seems inconsistent that prod(()) == 1 is fine but sum(()) is an error. Either it returns zero (following prod), or prod(()) should also be an error, IMO.

@cortner
Copy link

cortner commented Jul 21, 2019

This is something that I often trip over as well. +1 for resolving it.

I would be quite comfortable putting much of the burden on the user by prescribing and init keyword argument as for reduce. I.e. if I anticipate that a collection might be empty, then I will call

sum(i for i in Vector{Int}(); init=0)

@StefanKarpinski
Copy link
Member

That's a nice idea, @cortner 👍

@tkf
Copy link
Member

tkf commented Jul 22, 2019

Would it be crazy to have a singleton type to represent this concept in cases where the type is abstract?

FYI I created https://github.com/tkf/InitialValues.jl to implement this approach for general foldl and reduce. I find it handy if you want to handle "zero" for generic operations.

But, in this case, I'm +1 for sum(...; init) as suggested by @martinholters and @cortner. In general, I think it makes sense to add init for any function that is a thin wrapper of foldl or reduce.

@tkf
Copy link
Member

tkf commented Jun 8, 2020

#36188 adds init to sum etc.

@tkf
Copy link
Member

tkf commented Jun 11, 2020

#36188 is merged so we can use init in sum now. Can this issue be closed?

@cossio
Copy link
Contributor Author

cossio commented Jun 11, 2020

Thanks @tkf! Will it make it into 1.5?

@cossio cossio closed this as completed Jun 11, 2020
@tkf
Copy link
Member

tkf commented Jun 11, 2020

We already had 1.5 feature freeze like a few weeks ago, unfortunately. So, it'll be in 1.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants