Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][CUDA] Return invalid subgroup size warning #6183

Merged
merged 6 commits into from
Jun 4, 2022

Conversation

JackAKirk
Copy link
Contributor

@JackAKirk JackAKirk commented May 23, 2022

This is a solution to #6103 for the CUDA case only. HIP AMD case still needs to be considered as discussed here: #6103 (comment).

CUDA only currently supports one subgroup (warp) size : 32 for all devices.
This PR introduces a solution to #6103 appropriate for backends which only support a single subgroup size: if the optional kernel attribute reqd_sub_group_size() is used with the supported subgroup size then it will compile and behave as the programmer intends. If reqd_sub_group_size() is used with another incompatible subgroup size a warning is returned when compiling, such as:

reqd-sub-group-size-cuda.cpp:12:73: warning: attribute argument 8 is invalid and will be ignored; CUDA requires sub_group size 32 [-Wcuda-compat]
    h.single_task<class invalid_kernel>([=] [[sycl::reqd_sub_group_size(8)]] {});
                                                                                                                ^

Signed-off-by: JackAKirk [email protected]

@premanandrao
Copy link
Contributor

Could you add a test case for this please?

@al42and
Copy link
Contributor

al42and commented May 23, 2022

I would like to point out an issue with this approach. Sometimes, we might want to compile a single binary for multiple targets (with different sub-group sizes) and choose the proper type in the runtime. This PR breaks such a workflow because it will throw a compilation error whenever we have a non-32-wide kernel and an NVPTX target, even if we don't want to use them together.

A toy example: https://gist.github.com/al42and/7e580e2202bcd28425c473cb04c8fb02. Compilation string in the first line. Works fine with the current sycl branch, does not compile with this PR.

EDIT: the problem could be avoided by #5562, BTW :)

@JackAKirk
Copy link
Contributor Author

I would like to point out an issue with this approach. Sometimes, we might want to compile a single binary for multiple targets (with different sub-group sizes) and choose the proper type in the runtime. This PR breaks such a workflow because it will throw a compilation error whenever we have a non-32-wide kernel and an NVPTX target, even if we don't want to use them together.

A toy example: https://gist.github.com/al42and/7e580e2202bcd28425c473cb04c8fb02. Compilation string in the first line. Works fine with the current sycl branch, does not compile with this PR.

Thanks for pointing this out: I think you are right and this use case makes this PR not a good approach.

@JackAKirk JackAKirk closed this May 23, 2022
@al42and
Copy link
Contributor

al42and commented May 23, 2022

this use case makes this PR not a good approach.

I think it might be helpful to have a compile-time diagnostic, just make it a warning instead of an error? Compiling for multiple architectures might be niche (at least now), so having a warning could be helpful for many users, even if a few has to silence it.

It would also be nice if the warning was only triggered when NVPTX is the only backend, but I suspect checking that can be non-trivial with the compilation flow used.

@JackAKirk
Copy link
Contributor Author

this use case makes this PR not a good approach.

I think it might be helpful to have a compile-time diagnostic, just make it a warning instead of an error? Compiling for multiple architectures might be niche (at least now), so having a warning could be helpful for many users, even if a few has to silence it.

It would also be nice if the warning was only triggered when NVPTX is the only backend, but I suspect checking that can be non-trivial with the compilation flow used.

Thanks. I think this is a good suggestion. I will look into it.

@JackAKirk JackAKirk reopened this May 23, 2022
@JackAKirk JackAKirk marked this pull request as draft May 23, 2022 13:46
@JackAKirk JackAKirk closed this May 23, 2022
@zjin-lcf
Copy link
Contributor

I feel warning message is helpful.
The result of executing a HIP program fails on a MI100 GPU when the size of a wavefront is 64. The HIP program expects a wavefront of size 32.
Will the attribute "reqd_work_group_size(32)" make the HIP program succeed ? I am not clear about the answer.

@JackAKirk
Copy link
Contributor Author

It would also be nice if the warning was only triggered when NVPTX is the only backend, but I suspect checking that can be non-trivial with the compilation flow used.

Yes I'm not sure of the best way to check this. I think that the warning could also be useful when NVPTX is not the only backend: not all applications that compile for multiple architectures will correctly account for the warning, and if they have accounted for the warning then they will know they can safely ignore it.

I think that I will apply your first suggestion and just switch this error to a warning. I will also add a test at this point: I guess that adding a warning is probably not going to be contentious.

@JackAKirk
Copy link
Contributor Author

Will the attribute "reqd_work_group_size(32)" make the HIP program succeed ? I am not clear about the answer.

Do you mean "reqd_sub_group_size(32)"?
Not at the moment at least: The HIP AMD backend will require a proper implementation of reqd_sub_group_size(val) as discussed here: #6103 (comment)

@zjin-lcf
Copy link
Contributor

It was my typo. Thank you for your answer.

@JackAKirk JackAKirk reopened this May 24, 2022
Signed-off-by: JackAKirk <[email protected]>
@JackAKirk JackAKirk marked this pull request as ready for review May 24, 2022 16:05
@JackAKirk JackAKirk closed this May 24, 2022
@JackAKirk JackAKirk reopened this May 24, 2022
Signed-off-by: JackAKirk <[email protected]>
@JackAKirk JackAKirk closed this May 24, 2022
@JackAKirk JackAKirk reopened this May 24, 2022
@JackAKirk
Copy link
Contributor Author

Could you add a test case for this please?

Yep done.

Signed-off-by: JackAKirk <[email protected]>
Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@elizabethandrews
Copy link
Contributor

Please update PR description. It still says error is generated.

@JackAKirk JackAKirk changed the title [SYCL][CUDA] Return invalid subgroup size error [SYCL][CUDA] Return invalid subgroup size warning May 26, 2022
@JackAKirk
Copy link
Contributor Author

Please update PR description. It still says error is generated.

I've updated the description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants