Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend diff() to arbitrary dimensions #15414

Closed
juliohm opened this issue Mar 9, 2016 · 17 comments · Fixed by #29827
Closed

Extend diff() to arbitrary dimensions #15414

juliohm opened this issue Mar 9, 2016 · 17 comments · Fixed by #29827
Labels
arrays [a, r, r, a, y, s] help wanted Indicates that a maintainer wants help on an issue or pull request

Comments

@juliohm
Copy link
Contributor

juliohm commented Mar 9, 2016

I recently saw a post by Tim discussing the new CartesianIndex type. Does it help to write a more general diff() that works with N-dimensional arrays? Currently it only supports matrices (i.e. 2D arrays).

@juliohm
Copy link
Contributor Author

juliohm commented Mar 13, 2016

At least have it working in 3 dimensions would be great.

@timholy
Copy link
Member

timholy commented Mar 13, 2016

Sorry I didn't see this. Yes, absolutely. You could almost copy the final example verbatim and then tweak 1 or 2 lines.

@juliohm
Copy link
Contributor Author

juliohm commented Mar 13, 2016

Hi Tim! Could you solve this issue? I bookmarked your post and will read it again, it is really useful for the kinds of algorithms I been writing recently.

@tkelman
Copy link
Contributor

tkelman commented Mar 13, 2016

Tim's supposed to be on vacation :)

@juliohm
Copy link
Contributor Author

juliohm commented Mar 13, 2016 via email

@timholy
Copy link
Member

timholy commented Mar 16, 2016

Thanks, I am enjoying my vacation.

I may well get around to implementing this at some point, but the teacher in me can't resist pointing out that by asking someone else to do this for you, you're throwing away a fantastic learning opportunity. Contributing code to julia or its packages---while it might seem like extra work---is basically how you "buy" a ticket to getting free tutoring in how to write efficient code. If you really want to learn how to use the material in that blog post independently, you couldn't ask for a better way to learn it.

@juliohm
Copy link
Contributor Author

juliohm commented Mar 16, 2016 via email

@kshyatt kshyatt added the arrays [a, r, r, a, y, s] label Jul 28, 2016
@juliohm
Copy link
Contributor Author

juliohm commented Sep 29, 2016

Didn't have the time to look into it, anyone feel free to fix this issue!

@simonster simonster added the help wanted Indicates that a maintainer wants help on an issue or pull request label Sep 29, 2016
@sg1101
Copy link

sg1101 commented Feb 24, 2017

Hi @juliohm. I would like to try and solve this issue. I am new to open source and had never contributed till now. Can you please guide how to proceed. Thank You :)

@StefanKarpinski
Copy link
Member

Get an implementation working and make a PR after reading through CONTRIBUTING.

@juliohm
Copy link
Contributor Author

juliohm commented Apr 22, 2018

@sg1101 have you had the chance to work on this?

@sg1101
Copy link

sg1101 commented Apr 22, 2018 via email

@felixrehren
Copy link
Contributor

felixrehren commented Apr 22, 2018

@juliohm I gave it a first pass a little while ago, see
https://gist.github.com/felixrehren/21b300061ccf9bf3fa5396889c799b48
The file contains code (accum and decum) to calculate multidimensional summed area tables accum(+,A) and recover, from a summed area table, the original array: decum(+,accum(+,A)) == A. It works for any dimension and also works for multiplication * or probably any symmetric binary operation.

It has problems -- seems to spend a lot of time inferring, probably not best practice on writing array algorithms, and I wasn't sure about the basic design. (In particular, implementing summed area tables requires an inverse of +, and to keep it general I wrote a function inverse(::typeof(+),x) = -x. I like this, but am not sure if it is considered Julian) Also, I wasn't sure about changes due to 0.7 coming up

@juliohm
Copy link
Contributor Author

juliohm commented Oct 25, 2018

I am more experienced now in multidimensional code in Julia. Below is a first implementation that works in Julia v1.0:

function diffn(A::AbstractArray{T,N}; dim::Integer) where {T,N}
  ax = axes(A)
  sz = size(A)
  result = Array{T,N}(undef, ntuple(j -> j == dim ? sz[dim]-1 : sz[j], N))
  for i in 1:sz[dim]-1
    prev = CartesianIndices(ntuple(j -> j == dim ? i   : ax[j], N))
    next = CartesianIndices(ntuple(j -> j == dim ? i+1 : ax[j], N))
    result[prev] = A[next] - A[prev]
  end

  result
end

Do you have immediate suggestions before I open a pull request?

@timholy
Copy link
Member

timholy commented Oct 25, 2018

Awesome! Look forward to it. A few pro-tips:

  • First, @time the version you have now using a decently-sized array, say 1000x1000. Note the amount of time and number of allocations
  • Those ntuples are a good idea, but unfortunately j == dim ? i : ax[j] is not inferrable.
  • Note that you're copying chunks. Consider what types of operations allocate temporaries, and see if you can eliminate those (hint: broadcasting may be your friend).
  • When you develop new versions, check how you're doing with @time and @code_warntype. Hopefully you will find it rewarding to see your progress.

@juliohm
Copy link
Contributor Author

juliohm commented Oct 25, 2018

Thank you @timholy, I tried broadcasting with result[prev] .= A[next] .- A[prev] and the allocations went down a bit. How to solve the inference issue with the tuples? Also, I am having difficulty to understand the output of @code_warntype. Help is appreciated.

@timholy
Copy link
Member

timholy commented Oct 25, 2018

There are several different paths you could go here. One approach is outlined in https://julialang.org/blog/2016/02/iteration, in the section titled "Filtering along a specified dimension (exploiting multiple indexes)". Another would be to ask whether the combination of broadcasting and range specification lets you perform the entire operation "in one go," kind of how you're handling all but the "filtered" dimension now but even including the filtered dimension. See if that's enough to go on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] help wanted Indicates that a maintainer wants help on an issue or pull request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants