Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RAII issue by introducing wrapper classes for backend plans #208

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

yasahi-hpc
Copy link
Collaborator

@yasahi-hpc yasahi-hpc commented Dec 3, 2024

Resolves #197

In this PR, we try to resolve RAII issues by introducing a thin wrapper for backend fft plans.
Following modifications are made,

  • Introducing a KokkosFFT_FFTW_Types.hpp file including the FFTW plan wrapper class and common code bases related fftw
  • Introducing a wrapper class for automatic deallocation. Wrappers are added for FFTW, cufft, hipfft, and rocfft (Scoped***Plan)
    For CUDA, we have
struct ScopedCufftPlanType {
   cufftHandle m_plan;
   // Should we raise an error here, if it fails?
   ScopedCufftPlanType() { cufftCreate(&m_plan); }
   
   // Should we abort here, if it fails?
   ~ScopedCufftPlanType() { cufftDestroy(m_plan); }

   cufftHandle &plan() { return m_plan; }
};
  • Info and Buffer that are required only for rocfft are moved inside rocfft helper class
  • Add missing cleanup_threads() in the destructor of FFTW helper class
  • Remove destroy_plan_and_info since it is automatically done in destructors of helper classes

Questions

  1. We are creating plans inside a constructor for cufft and hipfft. Should we raise assertions in constructor if it fails?
  2. What should we do if destroy functions in destructor fails. Should we abort?

@yasahi-hpc yasahi-hpc self-assigned this Dec 3, 2024
@yasahi-hpc yasahi-hpc added bug Something isn't working enhancement New feature or request cleanup labels Dec 3, 2024
@yasahi-hpc yasahi-hpc changed the title Fix raii issue Fix RAII issue by introducing wrapper classes for backend plans Dec 3, 2024
@yasahi-hpc yasahi-hpc marked this pull request as draft December 3, 2024 15:04
@yasahi-hpc yasahi-hpc marked this pull request as ready for review December 4, 2024 19:30
@tpadioleau
Copy link
Member

I would go a step further and make these RAII classes unique-owners. That implies to disallow shallow copy either by simply deleting copy constructor and assignment or make them allocate and copy a new plan if backends offer such a copy function. We can keep the move semantics but we have to keep track if the current object must call cufftDestroy in the destructor. If we delete both the copy and move semantics that gives:

class ScopedCufftPlanType {
 private:
  cufftHandle m_plan;

 public:
  ScopedCufftPlanType() {
    cufftResult cufft_rt = cufftCreate(&m_plan);
    KOKKOSFFT_THROW_IF(cufft_rt != CUFFT_SUCCESS, "cufftCreate failed");
  }

  ScopedCufftPlanType(ScopedCufftPlanType const& rhs) = delete;

  ScopedCufftPlanType(ScopedCufftPlanType&& rhs) = delete;

  ~ScopedCufftPlanType() noexcept {
    cufftResult cufft_rt = cufftDestroy(m_plan);
    if (cufft_rt != CUFFT_SUCCESS) Kokkos::abort("cufftDestroy failed");
  }

  ScopedCufftPlanType& operator=(ScopedCufftPlanType const& rhs) = delete;

  ScopedCufftPlanType& operator=(ScopedCufftPlanType&& rhs) = delete;

  cufftHandle plan() const noexcept { return m_plan; }
};

The ScopedCufftPlanType::plan can return the backend handle by copy for CUDA, I don't know for the other backends.

I notice that we could also improve the error messages, I see that cufftCreate can return different error types https://docs.nvidia.com/cuda/cufft/index.html?highlight=cufftMakePlan1d#c.cufftCreate.

@yasahi-hpc
Copy link
Collaborator Author

I would go a step further and make these RAII classes unique-owners. That implies to disallow shallow copy either by simply deleting copy constructor and assignment or make them allocate and copy a new plan if backends offer such a copy function. We can keep the move semantics but we have to keep track if the current object must call cufftDestroy in the destructor.

I would prefer to disallow both copy and move, because it may be unsafe for some backends.

If we delete both the copy and move semantics that gives:

class ScopedCufftPlanType {
 private:
  cufftHandle m_plan;

 public:
  ScopedCufftPlanType() {
    cufftResult cufft_rt = cufftCreate(&m_plan);
    KOKKOSFFT_THROW_IF(cufft_rt != CUFFT_SUCCESS, "cufftCreate failed");
  }

  ScopedCufftPlanType(ScopedCufftPlanType const& rhs) = delete;

  ScopedCufftPlanType(ScopedCufftPlanType&& rhs) = delete;

  ~ScopedCufftPlanType() noexcept {
    cufftResult cufft_rt = cufftDestroy(m_plan);
    if (cufft_rt != CUFFT_SUCCESS) Kokkos::abort("cufftDestroy failed");
  }

  ScopedCufftPlanType& operator=(ScopedCufftPlanType const& rhs) = delete;

  ScopedCufftPlanType& operator=(ScopedCufftPlanType&& rhs) = delete;

  cufftHandle plan() const noexcept { return m_plan; }
};

The ScopedCufftPlanType::plan can return the backend handle by copy for CUDA, I don't know for the other backends.

Does it return a shallow copy of the cufftHandle? I could not find the details of copy constructor of cufftHandle. That should work for oneMKL backend, but I am not sure about the behavior of other backends

I notice that we could also improve the error messages, I see that cufftCreate can return different error types https://docs.nvidia.com/cuda/cufft/index.html?highlight=cufftMakePlan1d#c.cufftCreate.

That is true. We can improve the error messages

@tpadioleau
Copy link
Member

I would prefer to disallow both copy and move, because it may be unsafe for some backends.

Ok for me. I don't think there should be any problem with implementing the move semantics later. It requires us to only pass these objects by reference.

Does it return a shallow copy of the cufftHandle? I could not find the details of copy constructor of cufftHandle. That should work for oneMKL backend, but I am not sure about the behavior of other backends

I mean if we do

ScopedCufftPlanType plan1;
ScopedCufftPlanType plan2;
plan2 = plan1;

This compiles fine and plan2 will get the same cufftHandle than plan1. The plan2 will destroy correctly the handle but then plan1 will likely fail to do so.

@yasahi-hpc
Copy link
Collaborator Author

I would prefer to disallow both copy and move, because it may be unsafe for some backends.

Ok for me. I don't think there should be any problem with implementing the move semantics later. It requires us to only pass these objects by reference.

Does it return a shallow copy of the cufftHandle? I could not find the details of copy constructor of cufftHandle. That should work for oneMKL backend, but I am not sure about the behavior of other backends

I mean if we do

ScopedCufftPlanType plan1;
ScopedCufftPlanType plan2;
plan2 = plan1;

This compiles fine and plan2 will get the same cufftHandle than plan1. The plan2 will destroy correctly the handle but then plan1 will likely fail to do so.

That is right. This does not work in the current implementation. Yes, for safety, it is better to disallow copy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cleanup enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RAII issue
2 participants