-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apply_ufunc support for chunks on input_core_dims #1995
Comments
Have you tried the new |
I think if I think it would be nice if we could have a way to allow chunking along |
For two inputs, don't we use dask.array.tensordot? |
If |
One way to allow chunking across I'm reluctant to add For this specific problem, I think you could solve it with |
@shoyer could you make an example? That was my first thought but I could not figure out how to make the apply_ufunc do it. |
OK, thinking a little more about it, this would not work with |
Try: import dask.array
import numpy as np
def mulsum_chunk(a, b):
return np.einsum('...i,...i', a, b)[..., np.newaxis]
def mulsum(a, b):
# needs broadcasting/rechunking for a,b
mapped = dask.array.map_blocks(mulsum_chunk, a, b, dtype=float,
chunks=a.chunks[:-1] + (tuple(1 for _ in a.chunks[-1]),))
return dask.array.sum(mapped, axis=-1) |
[EDIT] drastically simplified chunking algorithm @shoyer , close, but your version doesn't work in case of broadcasting. import xarray
import numpy
import dask.array
coefficients = xarray.DataArray(
dask.array.random.random((106, 99), chunks=(25, 25)),
dims=['formula', 'time'])
components = xarray.DataArray(
dask.array.random.random((106, 512 * 1024), chunks=(25, 65536)),
dims=['formula', 'scenario'])
def mulsum(a, b, dim):
return xarray.apply_ufunc(
_mulsum_xarray_kernel, a, b,
input_core_dims=[[dim], [dim]],
dask='allowed', output_dtypes=[float])
def _mulsum_xarray_kernel(a, b):
if isinstance(a, dask.array.Array) and isinstance(b, dask.array.Array):
chunks = dask.array.core.broadcast_chunks(a.chunks, b.chunks)
chunks = chunks[:-1] + (tuple(1 for _ in chunks[-1]), )
mapped = dask.array.map_blocks(
_mulsum_dask_kernel, a, b,
dtype=float, chunks=chunks)
return dask.array.sum(mapped, axis=-1)
else:
return _mulsum_dask_kernel(a, b)
def _mulsum_dask_kernel(a, b):
a = numpy.ascontiguousarray(a)
b = numpy.ascontiguousarray(b)
res = numpy.einsum('...i,...i', a, b, optimize='optimal')
return res[..., numpy.newaxis]
mulsum(coefficients, components, dim='formula') Proposal 2Modify apply_ufunc:
My initial example would become: def mulsum_kernel(a, b):
return numpy.einsum('...i,...i', a, b)[..., numpy.newaxis]
c = xarray.apply_ufunc(
mulsum_kernel, a, b,
dask='parallelized',
input_core_dims=[['x'], ['x']],
output_dtypes=[float],
output_core_dims=[['__partial']],
output_chunks={'__partial': [1 for _ in a.chunks[a.dims.index('x')]}
).sum('__partial') Although I'm not sure this approach would be univocous when there's more than one core_dim... |
My main concern is ensuring that someone does not inadvertently apply a function not designed for multiple chunks to dask arrays. For example, suppose the function being applied is Some loud flag that makes it very obvious what's going on seems like a good idea, e.g., Then we also need some sort of guarantee that chunked core dimensions aren't entirely removed, or else xarray/dask won't know how to stack them back up. I guess we could check to make sure that at least as many output core dimensions appear as appear in inputs cor edimensions? |
@shoyer , you don't really need a parameter |
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
Has this not been solved by the argument @crusaderky isn't this effectively what you were trying to achieve? import xarray as xr
def mulsum(a, b):
acc = 0
for i in range(a.size):
acc += a[i] * b[i]
return acc
a = xr.DataArray(data=[1, 2, 3], dims=['x']).chunk({"x": 1})
b = xr.DataArray(data=[4, 5, 6], dims=['x']).chunk({"x": 1})
c = xr.apply_ufunc(
mulsum, a, b,
input_core_dims=[['x'], ['x']],
dask='parallelized', output_dtypes=[float],
dask_gufunc_kwargs={'allow_rechunk': True})
print(c.compute()) returns
I think this has only been possible since the implementation of If this is actually doing what I think it's doing then we should document this possibility! |
I am trying to optimize the following function:
where a and b are xarray.DataArray's, both with dimension x and both with dask backend.
I successfully obtained a 5.5x speedup with the following:
The problem is that this introduces a (quite problematic, in my case) constraint that a and b can't be chunked on dimension x - which is theoretically avoidable as long as the kernel function doesn't need interaction between x[i] and x[j] (e.g. it can't work for an interpolator, which would require to rely on dask ghosting).
Proposal
Add a parameter to apply_ufunc,
reduce_func=None
. reduce_func is a function which takes as input two parameters a, b that are the output of func. apply_ufunc will invoke it whenever there's chunking on an input_core_dim.e.g. my use case above would simply become:
So if I have 2 chunks in a and b on dimension x, apply_ufunc will internally do
Note that reduce_func will be invoked exclusively in presence of dask='parallelized' and when there's chunking on one or more of the input_core_dims. If reduce_func is left to None, apply_ufunc will keep crashing like it does now.
The text was updated successfully, but these errors were encountered: