Compile-time definitions to detect used SYCL targets #5562

al42and · 2022-02-12T13:49:53Z

Is your feature request related to a problem? Please describe

One or more targets can be passed to -fsycl-targets. For the program being compiled, it can be beneficial to know at compile time which targets were used. In my use case, different flavors of a kernel are used for different architectures (NVIDIA, Intel). If a certain architecture is not among the targets, one can skip compiling the corresponding flavor.

An additional benefit is an ability to early filter-out outright-incompatible devices (sycl::is_compatible is more robust, but does not appear to be working at the moment: #5561).

Describe the solution you would like

Have a compile-time macro for each target marking whether kernels are expected to be compiled for this target.

Describe alternatives you have considered

Add a constexpr aspect/flag to sycl::backend.

The text was updated successfully, but these errors were encountered:

romanovvlad · 2022-03-02T10:26:17Z

@al42and
Could you please provide an example of the code which uses such a compile-time macro?

al42and · 2022-03-06T13:18:00Z

Here is a specific discussion we had with @anton-v-gorshkov about our use case: https://gitlab.com/gromacs/gromacs/-/merge_requests/2248/diffs#note_743987749

The code there might be overly convoluted for historical reasons. But, in essence, something like this is attempted:

template<int subGroupSize>
void submitKernel(sycl::queue q, sycl::global_ptr<float> data, int size) {
    // Submit the kernel which uses sub-group functionality
    // Kernel is complex and takes a long time to compile
}

void doStuff(sycl::device dev, sycl::queue q, sycl::global_ptr<float> data, int size) {
    switch(getVendor(dev)) {
        case Vendor::Nvidia:
            #if HAVE_NVIDIA
            return submitKernel<32>(q, data, size);
            #else
            assert(false); // Don't instantiate the template for 32, don't waste time compiling it.
            #endif
        case Vendor::Intel:
            #if HAVE_INTEL
            return submitKernel<16>(q, data, size);
            #else
            assert(false);  // Don't instantiate the template for 16, don't waste time compiling it.
            #endif
    }
}

EDIT: Subgroup size is the most obvious example. There might be other differences, e.g. whether to manually prefetch some values.

EDIT2: As a workaround for faster compilation here, one can do an early return in the kernel (e.g., if (defined(__NVPTX__) && subGroupSize != 32)). But that does not help with other issues, like filtering out incompatible devices early.

elizabethandrews · 2022-03-31T22:01:12Z

@al42and just to make sure I understand requirement - you want these macros set during host compilation? For device compilation we have existing macros like NVPTX, etc

al42and · 2022-04-01T11:36:52Z

@al42and just to make sure I understand requirement - you want these macros set during host compilation? For device compilation we have existing macros like NVPTX, etc

Yes, I specifically want to check in the host code which offload architectures are enabled.

elizabethandrews · 2022-06-07T20:08:04Z

@al42and can the type trait any_device_has<aspect> be used for this purpose? It is defined in SYCL 2020 spec as follows -

The implementation also provides two traits that the application can use to query aspects at compilation
time. The trait any_device_has inherits from std::true_type if the compilation environment supports any device which has the specified aspect, and it inherits from std::false_type if no device has the
aspect. The trait all_devices_have inherits from std::true_type if all devices supported by the
compilation environment have the specified aspect, and it inherits from std::false_type if any device
does not have the aspect.

We are considering adding an extended aspect for each device type. For example, we might define aspects "aspect::ext_oneapi_intel_gpu" and "aspect::ext_oneapi_nvidia_gpu". Application can then be -

void doStuff(sycl::device dev, sycl::queue q, sycl::global_ptr<float> data, int size) {
    switch(getVendor(dev)) {
        case Vendor::Nvidia:
          if constexpr (sycl::any_device_has_v<sycl::aspect::ext_oneapi_nvidia_gpu>) {
            return submitKernel<32>(q, data, size);
          } else {
            assert(false); // Don't instantiate the template for 32, don't waste time compiling it.
          }
        case Vendr::Intel:
          if constexpr (sycl::any_device_has_v<sycl::aspect::ext_oneapi_intel_gpu>) {
            return submitKernel<16>(q, data, size);
          } else {
            assert(false);  // Don't instantiate the template for 16, don't waste time compiling it.
          }
    }
}

elizabethandrews · 2022-06-08T14:52:19Z

After discussions with the team, the consensus is that we will be implementing the macros as an extension

al42and · 2022-06-08T21:27:57Z

@elizabethandrews, the solution with sycl::any_device_has is indeed more elegant and sycl-esque than macros. As far as I can tell, it solves my problem perfectly.

That said, macros are ok too.

elizabethandrews · 2022-06-09T01:35:14Z

I believe macros help with avoiding all compile time overheads and offers more flexibility in some cases. So there is some sentiment to support it as well. Users can then choose to use whatever best suits their application

al42and · 2024-03-20T15:00:50Z

Hi!

Any progress on this?

I see that oneMKL project also has to manually parse the compiler flags in CMake to get the list of targets, and this is, to be honest, not a pretty solution.

al42and added the enhancement New feature or request label Feb 12, 2022

romanovvlad self-assigned this Mar 2, 2022

romanovvlad added the Need info Some clarifications are needed from the reporter label Mar 2, 2022

romanovvlad added confirmed and removed Need info Some clarifications are needed from the reporter labels Mar 23, 2022

romanovvlad removed their assignment Mar 23, 2022

al42and mentioned this issue May 23, 2022

[SYCL][CUDA] Return invalid subgroup size warning #6183

Merged

elizabethandrews assigned mdtoguchi Jun 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compile-time definitions to detect used SYCL targets #5562

Compile-time definitions to detect used SYCL targets #5562

al42and commented Feb 12, 2022

romanovvlad commented Mar 2, 2022

al42and commented Mar 6, 2022 •

edited

Loading

elizabethandrews commented Mar 31, 2022

al42and commented Apr 1, 2022

elizabethandrews commented Jun 7, 2022 •

edited

Loading

elizabethandrews commented Jun 8, 2022 •

edited

Loading

al42and commented Jun 8, 2022

elizabethandrews commented Jun 9, 2022

al42and commented Mar 20, 2024

Compile-time definitions to detect used SYCL targets #5562

Compile-time definitions to detect used SYCL targets #5562

Comments

al42and commented Feb 12, 2022

romanovvlad commented Mar 2, 2022

al42and commented Mar 6, 2022 • edited Loading

elizabethandrews commented Mar 31, 2022

al42and commented Apr 1, 2022

elizabethandrews commented Jun 7, 2022 • edited Loading

elizabethandrews commented Jun 8, 2022 • edited Loading

al42and commented Jun 8, 2022

elizabethandrews commented Jun 9, 2022

al42and commented Mar 20, 2024

al42and commented Mar 6, 2022 •

edited

Loading

elizabethandrews commented Jun 7, 2022 •

edited

Loading

elizabethandrews commented Jun 8, 2022 •

edited

Loading