Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array(...) (and constructors for all collection types) should always make copies #23107

Closed
KristofferC opened this issue Aug 3, 2017 · 13 comments
Assignees
Labels
arrays [a, r, r, a, y, s] breaking This change will break code collections Data structures holding multiple items, e.g. sets
Milestone

Comments

@KristofferC
Copy link
Member

KristofferC commented Aug 3, 2017

The behavior between Dict and an Array is inconsistent when it comes to when copying is made:

julia> a = [1,2,3];

julia> b = typeof(a)(a);

julia> b[1] = 3
2

julia> a
3-element Array{Int64,1}:
 3
 2
 3

julia> A = Dict(1=>"a", 2 =>"b", 3=>"c");

julia> B = typeof(A)(A);

julia> B[1] => "c"
"b"=>"c"

julia> A
Dict{Int64,String} with 3 entries:
  2 => "b"
  3 => "c"
  1 => "a" # <---- not "c"

In my opinion, Dict(::Dict) should not make a copy if it can just return the dict.

Reported originally at: https://discourse.julialang.org/t/why-is-d-dict-dict-d/5197

@KristofferC KristofferC added the breaking This change will break code label Aug 3, 2017
@KristofferC KristofferC added this to the 1.0 milestone Aug 3, 2017
@JeffBezanson
Copy link
Member

This might be a convert vs. construct difference --- convert should not make a copy if the value is already of the right type, but arguably a constructor should always make a new object (the word "constructor" seems to imply it anyway). I'd vote for the other change, and make Array copying.

@JeffBezanson JeffBezanson added the collections Data structures holding multiple items, e.g. sets label Aug 3, 2017
@martinholters
Copy link
Member

What about the convert fulfills the construct contract (or however it phrased)?

-(::Type{T})(arg) where {T} = convert(T, arg)::T
+(::Type{T})(arg::T) where {T} = copy(arg)
+(::Type{T})(arg) where {T} = convert(T, arg)::T

???

@garborg
Copy link
Contributor

garborg commented Aug 4, 2017

Here's a survey of what happens calling mutable structs' own constructors on themselves :

julia> tas = [
           (Array, [1,2,3]),
           (BitArray, 2),
           (SharedArray, [1,2,3]),
           (Dict, 2 => "a"),
           (ObjectIdDict, 2 => "a"),
           (Set, "a"),
           (Task, () -> rand(9)),
           (Channel, 2),
           (WeakRef, "a"),
           (Ref, "a"),
           (Regex, "^a"),
           (IOStream, "a"),
           (IOBuffer, "a")
       ];

julia> for (t, a) in tas
           print(t, "\n     ")
           x = t(a)
           y = t(x)
           println(x === y)
       end
Array
     true
BitArray
     true
SharedArray
     true
Dict
     false
ObjectIdDict
     false
Set
     false
Task
     false
Channel
     true
WeakRef
     false
Ref
     true
Regex
     true
IOStream
     true
Base.AbstractIOBuffer{Array{UInt8,1}}
     true

@StefanKarpinski
Copy link
Member

Yikes, that's a bit haphazard.

@andyferris
Copy link
Member

Very occasionally, the overlap between conversion and construction is annoying. While the default method for construction calling convert may be to blame here, there is still the issue that convert itself has no guarantees about aliasing.

My point being that given that generic code may be dependent on aliasing behaviour, generally it would be great if each function had documented aliasing guarantees. I'm not sure the current set of functions is built around this idea (for constructors, convert, and many others too)...

Thinking aloud, perhaps a trait or something could describe the aliasing relationship between two types through convert so at least generic code could introspect and deal with this (at compile time)?

@JeffBezanson While, yes, constructors may tend to allocate new objects, many are immutable wrappers (I'm thinking RowVector as an example here) which are designed around the idea of not performing copies; OTOH it is aliasing behaviour (often of elements or fields) which is important for programmers to reason about, not just whether the outermost object is semantically a "new" instance (of the outermost type)...

@martinholters
Copy link
Member

...or constructors are only allowed not to copy their input if it is wrapped in a Move object. 😁 Next up: attaching && to types...

@JeffBezanson
Copy link
Member

@martinholters see #15120... The fact that the fallback constructor doesn't call copy is just one way to look at the problem; convert might not even be defined, so we're basically just guessing at what might work to construct an instance of a type.

Thanks for the analysis @garborg . I wouldn't necessarily apply this issue to non-collections like IOStream though. And Task(::Task) should perhaps be an error.

Whatever else is decided here, it's very clear to me that convert cannot be required to make copies. convert(Any, x) should be a no-op, and therefore so should conversion to any type that already includes x.

I agree that some types like SubArray and RowVector act as "views" of data. They will have to have non-copying constructors; one size doesn't fit all. For an unknown type T, it's probably impossible and inadvisable to reason about whether T(x) aliases x. But I think we can very reasonably be concerned about this case:

function dostuff(pairs)
    d = Dict(pairs)
    d[newkey] = 0
    ...
end

Here I'm explicitly constructing a specific type of mutable collection. My function expects an iterator of pairs. It seems to work. But one day somebody passes an argument that's already a Dict, and the function clobbers their data. Hopefully we can agree that would be a bad situation.

@JeffBezanson JeffBezanson changed the title Dict(::Dict) should probably not make a copy Array(...) (and constructors for all collection types) should always make copies Aug 10, 2017
@JeffBezanson JeffBezanson self-assigned this Aug 10, 2017
@JeffBezanson
Copy link
Member

I should point out that Array has kind of always been a special case here, since we don't have an Array(iterator) constructor like we do for Set and Dict. collect is basically this Array constructor.

@JeffBezanson
Copy link
Member

This would also be fixed by #15120, since Array(array) would give a method error instead of calling convert.

@tkelman
Copy link
Contributor

tkelman commented Aug 10, 2017

we don't have an Array(iterator) constructor like we do for Set and Dict. collect is basically this Array constructor

little old, but xref #16633, #16029

@andyferris
Copy link
Member

To me, it would seem logical if collect became more generic and Array(iterator) did what you'd expect... For v1.0 with inferable keyword arguments (which is going to be kick-ass for API design), could we use something like Array(size = (3,4)) and Vector(length = 3)?

@JeffBezanson
Copy link
Member

We could do that, but I'm more worried about the idea of deprecating Array{T}((3,4)). Would we actually go ahead with that?

@andyferris
Copy link
Member

In the extreme case we would have to since a tuple of ints is a valid iterable that people might want to collect in an array.

I'm not loving that aspect but the keyword arg version is at least very easy to read (though much less concise!). Would be an extremely annoying deprecation if we did this so not sure if it's truly worth the effort or not...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] breaking This change will break code collections Data structures holding multiple items, e.g. sets
Projects
None yet
Development

No branches or pull requests

8 participants