Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile-time definitions to detect used SYCL targets #5562

Open
al42and opened this issue Feb 12, 2022 · 9 comments
Open

Compile-time definitions to detect used SYCL targets #5562

al42and opened this issue Feb 12, 2022 · 9 comments
Assignees
Labels
confirmed enhancement New feature or request

Comments

@al42and
Copy link
Contributor

al42and commented Feb 12, 2022

Is your feature request related to a problem? Please describe

One or more targets can be passed to -fsycl-targets. For the program being compiled, it can be beneficial to know at compile time which targets were used. In my use case, different flavors of a kernel are used for different architectures (NVIDIA, Intel). If a certain architecture is not among the targets, one can skip compiling the corresponding flavor.

An additional benefit is an ability to early filter-out outright-incompatible devices (sycl::is_compatible is more robust, but does not appear to be working at the moment: #5561).

Describe the solution you would like

  • Have a compile-time macro for each target marking whether kernels are expected to be compiled for this target.

Describe alternatives you have considered

  • Add a constexpr aspect/flag to sycl::backend.
@al42and al42and added the enhancement New feature or request label Feb 12, 2022
@romanovvlad
Copy link
Contributor

@al42and
Could you please provide an example of the code which uses such a compile-time macro?

@romanovvlad romanovvlad self-assigned this Mar 2, 2022
@romanovvlad romanovvlad added the Need info Some clarifications are needed from the reporter label Mar 2, 2022
@al42and
Copy link
Contributor Author

al42and commented Mar 6, 2022

Here is a specific discussion we had with @anton-v-gorshkov about our use case: https://gitlab.com/gromacs/gromacs/-/merge_requests/2248/diffs#note_743987749

The code there might be overly convoluted for historical reasons. But, in essence, something like this is attempted:

template<int subGroupSize>
void submitKernel(sycl::queue q, sycl::global_ptr<float> data, int size) {
    // Submit the kernel which uses sub-group functionality
    // Kernel is complex and takes a long time to compile
}

void doStuff(sycl::device dev, sycl::queue q, sycl::global_ptr<float> data, int size) {
    switch(getVendor(dev)) {
        case Vendor::Nvidia:
            #if HAVE_NVIDIA
            return submitKernel<32>(q, data, size);
            #else
            assert(false); // Don't instantiate the template for 32, don't waste time compiling it.
            #endif
        case Vendor::Intel:
            #if HAVE_INTEL
            return submitKernel<16>(q, data, size);
            #else
            assert(false);  // Don't instantiate the template for 16, don't waste time compiling it.
            #endif
    }
}

EDIT: Subgroup size is the most obvious example. There might be other differences, e.g. whether to manually prefetch some values.

EDIT2: As a workaround for faster compilation here, one can do an early return in the kernel (e.g., if (defined(__NVPTX__) && subGroupSize != 32)). But that does not help with other issues, like filtering out incompatible devices early.

@romanovvlad romanovvlad added confirmed and removed Need info Some clarifications are needed from the reporter labels Mar 23, 2022
@romanovvlad romanovvlad removed their assignment Mar 23, 2022
@elizabethandrews
Copy link
Contributor

@al42and just to make sure I understand requirement - you want these macros set during host compilation? For device compilation we have existing macros like NVPTX, etc

@al42and
Copy link
Contributor Author

al42and commented Apr 1, 2022

@al42and just to make sure I understand requirement - you want these macros set during host compilation? For device compilation we have existing macros like NVPTX, etc

Yes, I specifically want to check in the host code which offload architectures are enabled.

@elizabethandrews
Copy link
Contributor

elizabethandrews commented Jun 7, 2022

@al42and can the type trait any_device_has<aspect> be used for this purpose? It is defined in SYCL 2020 spec as follows -

The implementation also provides two traits that the application can use to query aspects at compilation
time. The trait any_device_has inherits from std::true_type if the compilation environment supports any device which has the specified aspect, and it inherits from std::false_type if no device has the
aspect. The trait all_devices_have inherits from std::true_type if all devices supported by the
compilation environment have the specified aspect, and it inherits from std::false_type if any device
does not have the aspect.

We are considering adding an extended aspect for each device type. For example, we might define aspects "aspect::ext_oneapi_intel_gpu" and "aspect::ext_oneapi_nvidia_gpu". Application can then be -

void doStuff(sycl::device dev, sycl::queue q, sycl::global_ptr<float> data, int size) {
    switch(getVendor(dev)) {
        case Vendor::Nvidia:
          if constexpr (sycl::any_device_has_v<sycl::aspect::ext_oneapi_nvidia_gpu>) {
            return submitKernel<32>(q, data, size);
          } else {
            assert(false); // Don't instantiate the template for 32, don't waste time compiling it.
          }
        case Vendr::Intel:
          if constexpr (sycl::any_device_has_v<sycl::aspect::ext_oneapi_intel_gpu>) {
            return submitKernel<16>(q, data, size);
          } else {
            assert(false);  // Don't instantiate the template for 16, don't waste time compiling it.
          }
    }
}

@elizabethandrews
Copy link
Contributor

elizabethandrews commented Jun 8, 2022

After discussions with the team, the consensus is that we will be implementing the macros as an extension

@al42and
Copy link
Contributor Author

al42and commented Jun 8, 2022

@elizabethandrews, the solution with sycl::any_device_has is indeed more elegant and sycl-esque than macros. As far as I can tell, it solves my problem perfectly.

That said, macros are ok too.

@elizabethandrews
Copy link
Contributor

I believe macros help with avoiding all compile time overheads and offers more flexibility in some cases. So there is some sentiment to support it as well. Users can then choose to use whatever best suits their application

@al42and
Copy link
Contributor Author

al42and commented Mar 20, 2024

Hi!

Any progress on this?

I see that oneMKL project also has to manually parse the compiler flags in CMake to get the list of targets, and this is, to be honest, not a pretty solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants