Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimizing and Testing Memory Usage in Low-Memory RK Methods #640

Closed
ChrisRackauckas opened this issue Feb 3, 2019 · 21 comments
Closed

Comments

@ChrisRackauckas
Copy link
Member

ChrisRackauckas commented Feb 3, 2019

@ranocha tasked us at first with fixing memory usage in #178 . We now have all of the pieces in place to do it correctly (see #178 (comment) ). What remains to be done are some calck optimizations on the low-memory methods, and testing to make sure it works out.

How to do it is already described, so let's talk about testing. Memory tracking makes sense in the asymptotic regime where u0 is large. When u0 is large, say 100,000, then, since the size of the integrator is constant, the size of the uType and rateType vectors will completely dominate. In that case, we can measure the integrator's size in order to see how many registers we are actually using. It's quite accurate:

using OrdinaryDiffEq
u0 = rand(1000,1000)
prob = ODEProblem((du,u,p,t)-> du .= u,u0,(0.0,1.0))
integ = init(prob,Tsit5(),save_start=false,save_end=false,
                          save_everystep = false) # this sets dense = calck = false
Base.summarysize(integ)/Base.summarysize(u0) # 13.00020737396313

and understands aliasing:

Base.summarysize([u0 for i in 1:7]) / Base.summarysize(u0) # 1.0000119999400003

So with this, our goal should be to get these methods down to exactly the amount they require. Some things to note are:

  • When we first start out in solve.jl, we copy u0. We should add a new argument to the common interface for copy_start = true. True for safety, since if you change u0 all sorts of wonky things will happen to people's code. But true performance purists may want us to re-use that array, so let's do it.
  • Low-memory RK methods, when calck is false, don't need the k values. They can make the algorithm work out just by aliasing those to another array instead. Note that these are crucial to common use though, since saveat is a widely used thing, so we should try to see if we can do this in a way that eliminates allocations even when calck=true (which is the saveat case)
  • Low-memory RK methods papers say how many registers it should be using, so we better be matching that integer. If we aren't, we should dig in and find out why.
@ranocha
Copy link
Member

ranocha commented Feb 4, 2019

Most low-storage methods make some assumption on the possible form of assignments using only two arrays, cf. Ketcheson, David I. "Runge–Kutta methods with minimum storage implementations." Journal of Computational Physics 229.5 (2010): 1763-1773:

  • 2N methods (Williamson): Assignemnts of the form u2 = a*u2 + dt*f(u1) (or u2 = a*u2 + b*dt*f(u1)) can be made using only two arrays.
  • 2R methods (van der Houwen): Assignments of the form u1 = f(u1) can be made using only one array.
  • 2S methods (Ketcheson): Assignemnts of the form u1 = a*u1 + b*u2 + dt*f(u1) can be made using only two arrays.

As far as I know, the current setup of OrdinaryDiffEq.jl does not allow any of these assumptions, since function evaluations are of the form u2 = f(u1) (possibly in-place, of course). Thus, we need one additional array compared to the memory requirements advertised in research articles.

For the user, it might be relatively easy to write the RHS f such that the 2N assumption is fulfilled. The other ones are more complicated in general.

@ChrisRackauckas
Copy link
Member Author

We might want to support alternative AbstractODEProblems for these then.

@ranocha
Copy link
Member

ranocha commented Feb 4, 2019

If we really want to minimise the memory usage this seems to be the way to go. This can also speed up the solution of the new specialised problems, e.g. if a linear ODE is to be solved using mul! for sparse arrays (cf. JuliaLang/julia#26117) or something similar if JuliaLang/julia#29634 is merged. An API might be similar to the five argument syntax of mul!.

@ChrisRackauckas
Copy link
Member Author

We may just want to add these as possibilities to ODEFunction. The whole point is to allow other forms of the function for optimizations there. The documentation will need some work for it though, and then the methods will need to handle the existence/non-existence of such optimized form.

@ranocha
Copy link
Member

ranocha commented Feb 7, 2019

I'm working on this and will soon push a PR.

Another possibility to reduce the memory would be to set uprev === u for some methods. In the current setup, this does not seem to be possible, since uprev is set in https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/ccd2287a2845a093638875762b74d4fce0bf14eb/src/solve.jl#L213
@ChrisRackauckas: Could it be safe if the algorithm/cache decided whether uprev is a new array or not?

@ChrisRackauckas
Copy link
Member Author

@ChrisRackauckas: Could it be safe if the algorithm/cache decided whether uprev is a new array or not?

uprev===u breaks interpolation, so that is only safe when calck==false. Think of calck as meaning that we know the user only wants the solving to work and no interpolation.

http://docs.juliadiffeq.org/latest/basics/common_solver_opts.html#Miscellaneous-1

It's set by:

https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/solve.jl#L27-L29

So note that if you have a saveat without a tstop there, you will be using interpolations. But for that size of PDEs, you probably can't save any intermediate values anyways? If you can, you can drop the dt for a step.

@ranocha
Copy link
Member

ranocha commented Feb 7, 2019

Yes, I would not save intermediate values for big PDEs (or would control that explicitly). There are some algorithms that use uprev and some that do not need it. We could reduce the size if the algorithm could tell that. Maybe something like alies_uprev(alg) which defaults to false and can be set to true if this is safe? Or move the initialisation of uprev to the cache?

@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Feb 7, 2019

Or move the initialisation of uprev to the cache?

We really should do that. It's just waiting to be done, look at the proximity:

https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/solve.jl#L213-L220

Well actually, why not just universally do calck ? uprev = recursivecopy(u) : uprev = u?

@ranocha
Copy link
Member

ranocha commented Feb 7, 2019

We could do that. But there are some methods that use uprev explicitly, e.g. https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/ccd2287a2845a093638875762b74d4fce0bf14eb/src/perform_step/low_storage_rk_perform_step.jl#L118
We would have to adapt them, but that's okay for me.

@ChrisRackauckas
Copy link
Member Author

Yeah thinking about it more, most RK methods need uprev in the b step at the end, so that won't work. uprev is also needed when adaptive since it's required for step rejections. It's just the low-storage methods. So yeah, it would make sense moving it into the cache for the algorithms to take control of how that's done.

@ranocha
Copy link
Member

ranocha commented Feb 7, 2019

I would leave that task open for another PR and just submit some first improvements and checks soon.

@ChrisRackauckas
Copy link
Member Author

Yup that sounds good.

@ranocha
Copy link
Member

ranocha commented Feb 8, 2019

If we move the initialisation of uprev to the cache, we should do the same with uprev2, don't we? In that case, we can either forward the keyword argument allow_extrapolation from
https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/7dd04a406c9097f14542317684d26d2ddf0e193f/src/solve.jl#L58
to the cache or let the cache decide what to do. Which possibility do you prefer, @ChrisRackauckas?

@ranocha
Copy link
Member

ranocha commented Feb 8, 2019

Thinking more about it, I do not really like the idea to more uprev and uprev2 to the caches. Since are lots of methods which do not need at least one of them, the structure of the caches would not reflect theirs needs.

@ranocha
Copy link
Member

ranocha commented Feb 9, 2019

Short summary (since I don't have time to work on that part right now): After all the changes linked above, the remaining change is to add new forms of ODEFunctions:

We may just want to add these as possibilities to ODEFunction. The whole point is to allow other forms of the function for optimizations there. The documentation will need some work for it though, and then the methods will need to handle the existence/non-existence of such optimized form.

@kanav99
Copy link
Contributor

kanav99 commented Mar 5, 2019

I'd like to work on this issue if its not done yet.
According to my understanding, making an option rewrite_u in ODEProblem would do our work. For eg. if the function is f(u, p, t) = A*u where A is an upper triangular matrix. In this case u is rewritable. If rewrite_u is set, we could alias du with u to carry out the function evaluations of form u1 = f(u1). Similarly all the three forms could be incorporated using this option.

@ranocha
Copy link
Member

ranocha commented Mar 5, 2019

There is some current discussion on the matrix multiplication API. Hopefully, some decision on the API of inplace multiplication of the form we need for 2N methods will be made soon, cf.
JuliaLang/LinearAlgebra.jl#473.

@ChrisRackauckas
Copy link
Member Author

According to my understanding, making an option rewrite_u in ODEProblem would do our work. For eg. if the function is f(u, p, t) = A*u where A is an upper triangular matrix. In this case u is rewritable. If rewrite_u is set, we could alias du with u to carry out the function evaluations of form u1 = f(u1). Similarly all the three forms could be incorporated using this option.

I think this needs to be in-method. If a specific method can make good reuse of u, it should, and then it can likely delete a cache. Of course, not all can, and it can only be done when units aren't used.

I don't think the in-place matrix multiplication API is necessary to create the right ODEFunction infrastructure, but for a user to actually define the function without allocations it'll be needed.

@ranocha
Copy link
Member

ranocha commented Mar 5, 2019

but for a user to actually define the function without allocations it'll be needed.

That's what I meant. We might want to use an API similar to the final decision for the matrix multiplication API. If that's mul!(C, A, B, α=1, β=0) for C = α*A*B + β*C, we can use f!(du, u, p, t, α, β) for du = α*f(u, p, t) + β*du, needed for 2N methods.

@kanav99
Copy link
Contributor

kanav99 commented Mar 5, 2019

Yes it would be dependent on both the problem and the algorithm. There may be a function where s = f(s) is not possible using a single register. So it should be specified by user that this transformation is possible for this function.
If this is possible, then the method (infact it may be possible to use this extra register in any method) may exploit this fact. I will make a PR which implements this to show the example.

@ChrisRackauckas
Copy link
Member Author

Oh I see where you're going with this. Yeah, this would be a neat solution without requiring additional function interfaces. Great idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants