Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntoPyArray for Array is broken when the first element is not at the start of the allocation #182

Closed
jturner314 opened this issue May 9, 2021 · 1 comment · Fixed by #194

Comments

@jturner314
Copy link
Contributor

jturner314 commented May 9, 2021

The address of the first element of an ndarray::Array is not necessarily the same as the start of the array's allocation. The IntoPyArray implementation for ArrayBase<OwnedRepr<A>, D> does not account for this, so the resulting NumPy array can contain the wrong portion of the data or even memory outside of the original allocation.

Here's an example with a negative stride which results in the NumPy array containing data out-of-bounds of the allocation:

use ndarray::prelude::*;
use numpy::IntoPyArray;

fn main() {
    let mut arr = Array2::<f64>::from_shape_fn([3, 4], |(i, j)| (i * 10 + j) as f64);
    arr.invert_axis(Axis(0));
    let copy = arr.clone();
    pyo3::Python::with_gil(|py| {
        let py_arr = arr.into_pyarray(py);
        assert_eq!(py_arr.readonly().as_array(), copy);
    })
}

The assertion fails with the following message:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `[[0.0, 1.0, 2.0, 3.0],
 [0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000691145546380947, 0.0, 0.0, 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000056],
 [0.0, 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000024, -15651843559535952000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0, 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000691145507655054]], shape=[3, 4], strides=[-4, 1], layout=c (0x4), const ndim=2`,
 right: `[[20.0, 21.0, 22.0, 23.0],
 [10.0, 11.0, 12.0, 13.0],
 [0.0, 1.0, 2.0, 3.0]], shape=[3, 4], strides=[-4, 1], layout=c (0x4), const ndim=2`', examples/into_pyarray.rs:10:9

Here's an example with all positive strides where the NumPy array contains an incorrect portion of the original data:

use ndarray::prelude::*;
use numpy::IntoPyArray;

fn main() {
    let mut arr = Array2::<f64>::from_shape_fn([3, 4], |(i, j)| (i * 10 + j) as f64);
    arr.slice_collapse(s![1.., ..]);
    let copy = arr.clone();
    pyo3::Python::with_gil(|py| {
        let py_arr = arr.into_pyarray(py);
        assert_eq!(py_arr.readonly().as_array(), copy);
    })
}

The assertion fails with the following message:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `[[0.0, 1.0, 2.0, 3.0],
 [10.0, 11.0, 12.0, 13.0]], shape=[2, 4], strides=[4, 1], layout=Cc (0x5), const ndim=2`,
 right: `[[10.0, 11.0, 12.0, 13.0],
 [20.0, 21.0, 22.0, 23.0]], shape=[2, 4], strides=[4, 1], layout=Cc (0x5), const ndim=2`', examples/into_pyarray.rs:10:9

(I ran these examples with the latest version of the main branch.)

Currently, the best way to get a boxed slice of an array's data and a pointer to the first element is like this:

    let orig_ptr = arr.as_ptr();
    let offset_is_zero = arr.is_empty() || std::mem::size_of::<A>() == 0;
    let v = arr.into_raw_vec();
    let offset = if offset_is_zero {
        0
    } else {
        unsafe { orig_ptr.offset_from(v.as_ptr()) as usize }
    };
    let mut boxed = v.into_boxed_slice();
    let ptr = unsafe { boxed.as_mut_ptr().add(offset) };
    // From here, `boxed` is the allocation, and `ptr` is a pointer to the first element of the n-D array.

Note that:

  • We need to check for empty arrays and zero-sized elements because the array's pointer is somewhat arbitrary in those cases.
  • The offset to the first element is computed before converting into a boxed slice, because .into_boxed_slice() may reallocate.

I've created rust-ndarray/ndarray#994 to provide a better way to get the correct offset.

@bluss
Copy link

bluss commented May 12, 2021

It seems like we could do better than just providing a tricky way to do this.

What's the interface that PyO3 requires here? What would it do if the array was non-contiguous? I can see that it's nice if reallocation and copying could be avoided, but perhaps it would be useful to have an ndarray method that could convert the array into a contiguous one with non-negative strides (and otherwise unchanged, if possible)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants