-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PencilFFTs on GPUs? #3
Comments
Hi Ali, thanks again for your interest! I would be happy to assist with making PencilFFTs work with Oceananigans, so let me know if you have any questions or suggestions. The CPU version should be relatively easy to implement. Just note that for now, PencilFFTs doesn't support in-place FFTs (which I noticed that you're using for the Poisson solver), so in principle you would need separate storages for inputs and outputs. In this case I would suggest to use real-valued inputs and real-to-complex (r2c) transforms. As an alternative, I'm planning to add support for in-place c2c and r2r transforms in the near future. I'm also very interested in supporting GPUs. It should be simple to add an interface for CuArrays, but I'm not sure if the data transposition functions will work without modification. If they don't work, I think a first step would be to make things work for a single GPU (avoiding all the MPI data transposition machinery), which can already be useful by itself. For the |
I have to work on some other stuff over the next week or so, but hoping to dig into PencilFFTs.jl next week!
Ah that's unfortunate but definitely not a barrier for now. I think I'll try to get something working first (ignoring performance), but that's a good point: I'll make sure to allocate an input and an output, thanks for the heads up.
Ah yeah that's a good point. I'll try to have a closer look at @leios has done lots of work on multi-GPU transposes and might know how things go on a GPU.
Yeah that's essentially what we do following Makhoul (1980) so we can do without padding if we permute indices, but it's just specifically for |
Actually, forget what I said. I just started working on in-place transforms, and it seems like it's going to be easy to implement. They'll probably be ready by the time you start working on this.
It would be great if you guys can help with the transposes on the GPU!
Right, in that case I agree with you and I'd say it's not worth it to support that kind of transforms. |
That's awesome! I wonder how much that will improve the benchmark vs. P3DFFT (or if the benchmark is supposed to allocate). |
I'm guessing it won't change much, since the allocated buffers are persistent (they're a field of Specifically for the P3DFFT comparisons, there's actually another problem, which is that P3DFFT v2 (i.e. the Fortran version) only does real-to-complex (r2c) transforms. For now I'm not planning on supporting in-place r2c transforms (as opposed to c2c or r2r), since they are much more complicated because both the size and the type of the data change from input to output. FFTW.jl itself doesn't support in-place r2c for the same reasons (even though there's an open PR to do this...). |
Sorry for going silent for over a year, finally started adding MPI support for Oceananigans.jl starting with just the CPU and PencilFFTs.jl worked great! For GPU support, it seems that PencilArrays.jl might readily support Then Would it make sense to first add GPU/CuArrays tranpose tests to PencilArrays.jl? |
It would be great if we could add support for GPU arrays! Yes, I think the first step would be to make sure that PencilArrays wrapping From the PencilFFTs side, I think there's not much to do to support GPU arrays, other than choosing the right FFT implementation based on the array types. Some parts of the plan creation code may need to be adapted for the kind of array as well. |
Is it now support for CuArray? |
Hi, support for CuArray is not completely done but it shouldn't be too much work. I'll try to look at that next week. |
Glad to know that! I'll give a try on multi-gpu nodes |
Dear @jipolanco , I have another question on benchmarks. |
Using BenchmarkTools with MPI is a bit tricky since processes need to be synchronised. But it's possible. You can look at this thread. In there I proposed a solution that used to work for me if I remember correctly. |
Hi @jipolanco this package looks really great, thank you for working on it! Documentation is great for such a new package. It's what I've been looking for to add distributed parallelism to Oceananigans.jl.
We run on both CPUs and GPUs so I was wondering if you knew whether PencilFFTs.jl would easily generalize to CuArrays? From skimming through the source code I feel like not much has to change as MPI functions should dispatch on the array type but maybe the FFT plans would have be done a little differently? I think cuFFT has a pretty similar interface to FFTW so it shouldn't be a big change, but cuFFT doesn't do
REDFT
andRODFT
so some plans would not be supported I guess.I will try to get a parallel version working with PencilFFTs.jl working on CPUs first though.
More than happy to help with adding GPU support.
The text was updated successfully, but these errors were encountered: