-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On making task configuration and task args available to the user without executing #769
Comments
@oleksandr-pavlyk maybe you have insight on this ? |
As discussed IRL, this is related to the problem described in:
Based on my understanding we need to make sure that |
If I understand correctly the relevant paragraph is:
If all what it takes to avoid issues is ensuring a I have a doubt about the meaning of the text that follows:
If I undertand correctly, taking this function as example:
|
Is there a way to confirm whether or not numba_dpex/dpctl use an in-order queue in this case? There is no option to use an out-of-order queue with explicit dependencies on the tasks themselves? |
@ogrisel sycl queues have an attribute that says if it is in order or not: https://intelpython.github.io/dpctl/latest/docfiles/dpctl/SyclQueue.html#dpctl.SyclQueue.is_in_order |
Hearing about refactoring in #816 I wondered if you also plan to work towrad this ? If I understand correctly this feature would unlock the equivalent of cuda streams in numba.cuda and there are various issues that refers to it (e.g. #147 ) |
@fcharras @ogrisel @oleksandr-pavlyk The bulk of the internal refactoring that was planned for the kernel API is now in master. PR #1049 is going to remove the support for NumPy arrays as kernel arguments. Once #1049 is merged, we are free to support returning an event from the kernel submission call. There are certain design questions that should be addressed before that:
|
Thank you for the work @diptorupd Regarding the design questions here are my thoughts. Those features are interesting for several reasons:
The cost of those steps is low but not negligible so it's good if it can be removed once a kernel have been specialized. To avoid mistakes with cache and input types, I think it's fair to limit the API to specialized kernels. Once a kernel have been specialized there's no reason to probe the cache to call the kernel object later on. Couldn't it be something as simple as:
Beside that I think that addressing For For |
@fcharras @ogrisel Asynchronous kernel submission was a feature we have recently added to We will migrate the feature to our |
It seems the underlying
dpctl
API supports dispatchingSyclKernel
asynchronously andnumba_dpex.kernel
is a few code lines away of making available to the user tasks that can be fed toSyclQueues
and mapped toSyclEvents
that embeds a DAG of execution dependencies.This small test script show how it works, the
async_iter
function here is five time faster that itssync
counterpart.Would you say that this is a correct take and that it could be interesting to use, would there be a case for exposing corresponding public methods ?
The text was updated successfully, but these errors were encountered: