Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behaviour of map and filter for sparse matrices (vectors) #6918

Closed
jumutc opened this issue May 22, 2014 · 39 comments
Closed

Strange behaviour of map and filter for sparse matrices (vectors) #6918

jumutc opened this issue May 22, 2014 · 39 comments
Labels
bug Indicates an unexpected problem or unintended behavior sparse Sparse arrays
Milestone

Comments

@jumutc
Copy link

jumutc commented May 22, 2014

Please consider the following:

julia> a = sprand(1000000,1,.001)
1000000x1 sparse matrix with 1032 Float64 entries:
    [2595   ,       1]  =  0.263955
    [3511   ,       1]  =  0.46256
    [4244   ,       1]  =  0.793747
    [7711   ,       1]  =  0.481846
    [7898   ,       1]  =  0.910035
    [8008   ,       1]  =  0.210312
    [11887  ,       1]  =  0.997936
    
    [991837 ,       1]  =  0.975212
    [993002 ,       1]  =  0.806656
    [995393 ,       1]  =  0.0327244
    [996098 ,       1]  =  0.780851
    [996254 ,       1]  =  0.295865
    [998174 ,       1]  =  0.56871
    [998274 ,       1]  =  0.343838
    [999891 ,       1]  =  0.238091

julia> filter(b -> b<2,a)
1000000x1 sparse matrix with 0 Float64 entries:

julia> filter(b -> b<0.2,a)
ERROR: BoundsError()
 in getindex at sparse/sparsematrix.jl:717

julia> map(b -> if b<0.2 0 else b end,a)
ERROR: InexactError()
 in setindex! at sparse/sparsematrix.jl:1038
 in setindex! at sparse/sparsematrix.jl:1032
 in map_to! at abstractarray.jl:1258
 in map at abstractarray.jl:1267

julia> map(b -> if b<2 0 else b end,a)
1000000x1 sparse matrix with 0 Int64 entries:

The same reproduces perfectly for matrices as well!

@jiahao
Copy link
Member

jiahao commented May 22, 2014

(Just a technical note: sprand always produces a matrix. We don't support sparse vectors yet, but we do support N×1 sparse matrices.)

@jumutc
Copy link
Author

jumutc commented May 22, 2014

No problem... but it looks like a bug. It would nice be have general map and filter functions which apply specifically to non-zero entries too.

@jiahao
Copy link
Member

jiahao commented May 22, 2014

Well, the map behavior is at least explainable. b -> if b<0.2 0 else b end produces either an Int or a Float64. (In Julia, 0 is not identical to 0.0.) If the first element is below the threshold, a sparse matrix of Int64s will be created, and assigning a Float64 into it will produce the InexactError. There is no problem if you changed this to b -> if b<0.2 0.0 else b end, or even b -> b<0.2 ? 0.0 : b:

julia> map(b -> if b<0.2 0.0 else b end,a)
1000000x1 sparse matrix with 779 Float64 entries:
    [69     ,       1]  =  0.816295
    [859    ,       1]  =  0.620671
    [7695   ,       1]  =  0.65508
    [8110   ,       1]  =  0.402167
    [8112   ,       1]  =  0.461852
    [9201   ,       1]  =  0.432456
    [9705   ,       1]  =  0.421811
    

@jumutc
Copy link
Author

jumutc commented May 22, 2014

But why we are not trying to perform convert(Type{Tv},v) and fit the value in this case? Maybe it's not a good idea and it will slow down the performance.

@jiahao
Copy link
Member

jiahao commented May 22, 2014

But why we are not trying to perform convert(Type{Tv},v) and fit the value in this case?

That doesn't work in general. Imagine if you had b -> b<0.2 ? b : "error" as the function to map over.

@jumutc
Copy link
Author

jumutc commented May 22, 2014

Maybe at least error should be more explanatory and we need to check the mapping function for inconsistencies first!

@jiahao
Copy link
Member

jiahao commented May 22, 2014

You may be interested in the discussion in #6548.

@jumutc
Copy link
Author

jumutc commented May 22, 2014

Sounds attractive. Only filter seems to be "broken" right now!

@jiahao
Copy link
Member

jiahao commented May 22, 2014

Here is a minimal example:

julia> A=sparse([1.0]);  filter(x->x==0, A)
ERROR: BoundsError()
 in getindex at sparse/sparsematrix.jl:718
 in getindex at abstractarray.jl:368
 in filter at array.jl:1216

julia> A=sparse([1.0]);  filter(x->x==1, A)
1x1 sparse matrix with 1 Float64 entries:
    [1, 1]  =  1.0

For some reason, A[1,0] is being accessed in the first case but not the second.

@jiahao
Copy link
Member

jiahao commented May 22, 2014

The first case involves the call A[I] where I is an invalid SparseMatrixCSC{Bool,Int64}; it has colptr = [1], rowptr = [1, 1] and nzval = []. Have to step away now, but would be great if someone else could look at it. @ViralBShah?

@JeffBezanson
Copy link
Member

Bump.

Sparse lacks indexing with a single index, except for linear indexing with an Int. Here we need A[I::AbstractArray{Bool}]. This is probably due to the lack of sparse vectors. Maybe we should return Nx1 sparse matrices until then?

@ViralBShah
Copy link
Member

Cc @tanmaykm too.

@ViralBShah ViralBShah added this to the 0.3 milestone May 29, 2014
@ViralBShah
Copy link
Member

We should try fix this for 0.3

@ViralBShah
Copy link
Member

I think that the same discussion as we had on map should probably apply here. @jumutc I suspect that one wants to only apply filter on the nonzeros as well, just like map.

Thanks for kicking the tires on all this. These have always been open questions, and these discussions are getting some answers and clarity on behaviour.

@jumutc
Copy link
Author

jumutc commented May 29, 2014

My consideration is due to very slow executions of map and filter in tight loops. This discussion is also linked to https://github.com/JuliaLang/julia/issues/7010. By the way these functions become even slower when GC comes into picture. In this case I have to rely on in place mapping and filtering (map! and filter!) or re-instantiate SparseMatrixCSC objects from the scratch!

@ViralBShah
Copy link
Member

For the sparse indexing stuff, I have opened #7023

@StefanKarpinski
Copy link
Member

I really think the only thing we can do here for map(f,s) is to check that f(0) == 0 and throw an error otherwise. If you want produce a dense array, you can do map!(f,full(s)).

@mlubin
Copy link
Member

mlubin commented Jun 5, 2014

That seems reasonable.

@lindahua
Copy link
Contributor

lindahua commented Jun 5, 2014

+1 for @StefanKarpinski's solution

@jiahao
Copy link
Member

jiahao commented Jun 5, 2014

+1

@jumutc
Copy link
Author

jumutc commented Jun 5, 2014

Wise solution, +1

@johnmyleswhite
Copy link
Member

Doesn't Stefan's solution assume that your function is pure? While this is statistically reasonable, it seems potentially problematic to me.

@jiahao
Copy link
Member

jiahao commented Jun 5, 2014

Fair point, but I'm not sure calling map with anything other than a pure function is a good idea.

@johnmyleswhite
Copy link
Member

I agree that it's weird, but I'm not convinced having map work on sparse matrices makes sense either. For me, the only functions that should return sparse results are those functions that can be proven to preserve sparsity a priori. Since map can take in an arbitrary functions and those functions might depend upon run-time information, map will not work for some inputs. And when it fails, it will fail silently.

Imagine that you have a function like foo(x) = rand() > 0.5 ? x + 0 : x + 1. This function will give randomly wrong results under the proposed definition of map.

@JeffBezanson
Copy link
Member

Interesting point; a function like that is kind of reasonable to use with map. I'd be ok with not supporting map for sparse.

@ViralBShah
Copy link
Member

Yes, I prefer not supporting map for sparse, and letting users directly work with the nonzeros.

@mlubin
Copy link
Member

mlubin commented Jun 5, 2014

That's fine with me. We should provide a useful error message though and not just remove the method.

@StefanKarpinski
Copy link
Member

To play devil's advocate here, if you use a random function like that to map over a sparse array, then you randomly get an error or a result, which is kind of what you were asking for, no? The common case here is clearly applying a pure function that takes zero to zero, in which case my proposal does exactly what you want.

@johnmyleswhite
Copy link
Member

My take is that we usually want to ensure that we don't fail silently in the worst case, even if this makes the common case a little more difficult.

@johnmyleswhite
Copy link
Member

Also, worth noting that this proposal would make map behave differently for dense arrays and sparse arrays. For dense ones, things would always work. For sparse ones, things would be broken and you wouldn't know.

@nalimilan
Copy link
Member

Applying a function with a random behavior using map doesn't seem like a very plausible use-case, and the fact that in some cases you will get an error is enough to make it a relatively limited risk. Are there other cases where the proposed behavior would be dangerous? Having map work for sparse matrices like it does for dense arrays, while preserving the array structure and performance, sounds quite appealing.

@johnmyleswhite
Copy link
Member

I can definitely see why you might want to allow this behavior for convenience, but the proposal at hand is to make map work in a meaningfully different way for sparse arrays than it does for dense arrays. The proposal is effectively partial memoization in which you compute the value of f(0) once and then assume that this value will always be correct, so you don't even bother evaluating f ever again. That's not the semantics that map implements for dense arrays, although it might be the semantics we'd want to insist users should expect.

@ViralBShah
Copy link
Member

The underlying sparse indexing issue causing the failure reported here is now fixed by #7047. For a discussion on map and sparse, we could perhaps have a newer issue, or reopen this one with a changed title.

@StefanKarpinski
Copy link
Member

@johnmyleswhite, that depends on what you consider the semantics of map to be. Does map(f,a) evaluate the function on every implied value of a or on every stored value of a? If the meaning is the later, then my proposed semantics are identical to the dense semantics, where the default value of the sparse matrix is considered to be "stored" once (even though it's implicit in most sparse matrices).

@johnmyleswhite
Copy link
Member

Replying to @StefanKarpinski here although the response might belong in #7157. I totally agree that you could define the meaning of map in terms of stored values, but I think that's subtle enough that some people are going to get confused.

@carlobaldassi
Copy link
Member

I have

@carlobaldassi
Copy link
Member

I have to point out that I did not consider side effects when I implemented map for BitArrays and thus that the code uses memoization for obvious performance reasons. We may need to change that unless the consensus is for assuming pure functions.

@mlubin
Copy link
Member

mlubin commented Jun 7, 2014

The concept of stored values shouldn't leak through to high-level functions like these.

@nalimilan
Copy link
Member

@mlubin It doesn't have to, as the difference is invisible as long as the passed function is pure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior sparse Sparse arrays
Projects
None yet
Development

No branches or pull requests

10 participants