-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compile-time definitions to detect used SYCL targets #5562
Comments
@al42and |
Here is a specific discussion we had with @anton-v-gorshkov about our use case: https://gitlab.com/gromacs/gromacs/-/merge_requests/2248/diffs#note_743987749 The code there might be overly convoluted for historical reasons. But, in essence, something like this is attempted: template<int subGroupSize>
void submitKernel(sycl::queue q, sycl::global_ptr<float> data, int size) {
// Submit the kernel which uses sub-group functionality
// Kernel is complex and takes a long time to compile
}
void doStuff(sycl::device dev, sycl::queue q, sycl::global_ptr<float> data, int size) {
switch(getVendor(dev)) {
case Vendor::Nvidia:
#if HAVE_NVIDIA
return submitKernel<32>(q, data, size);
#else
assert(false); // Don't instantiate the template for 32, don't waste time compiling it.
#endif
case Vendor::Intel:
#if HAVE_INTEL
return submitKernel<16>(q, data, size);
#else
assert(false); // Don't instantiate the template for 16, don't waste time compiling it.
#endif
}
} EDIT: Subgroup size is the most obvious example. There might be other differences, e.g. whether to manually prefetch some values. EDIT2: As a workaround for faster compilation here, one can do an early return in the kernel (e.g., |
@al42and just to make sure I understand requirement - you want these macros set during host compilation? For device compilation we have existing macros like NVPTX, etc |
Yes, I specifically want to check in the host code which offload architectures are enabled. |
@al42and can the type trait
We are considering adding an extended aspect for each device type. For example, we might define aspects "aspect::ext_oneapi_intel_gpu" and "aspect::ext_oneapi_nvidia_gpu". Application can then be -
|
After discussions with the team, the consensus is that we will be implementing the macros as an extension |
@elizabethandrews, the solution with That said, macros are ok too. |
I believe macros help with avoiding all compile time overheads and offers more flexibility in some cases. So there is some sentiment to support it as well. Users can then choose to use whatever best suits their application |
Hi! Any progress on this? I see that oneMKL project also has to manually parse the compiler flags in CMake to get the list of targets, and this is, to be honest, not a pretty solution. |
Is your feature request related to a problem? Please describe
One or more targets can be passed to
-fsycl-targets
. For the program being compiled, it can be beneficial to know at compile time which targets were used. In my use case, different flavors of a kernel are used for different architectures (NVIDIA, Intel). If a certain architecture is not among the targets, one can skip compiling the corresponding flavor.An additional benefit is an ability to early filter-out outright-incompatible devices (
sycl::is_compatible
is more robust, but does not appear to be working at the moment: #5561).Describe the solution you would like
Describe alternatives you have considered
constexpr
aspect/flag tosycl::backend
.The text was updated successfully, but these errors were encountered: