-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add dynamic borrow checking for dereferencing NumPy arrays. #274
Conversation
src/borrow.rs
Outdated
D: Dimension, | ||
{ | ||
pub(crate) fn try_new(array: &'a PyArray<T, D>) -> Option<Self> { | ||
let address = array as *const PyArray<T, D> as usize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As written in #258, it might actually make sense to follow the "base object chain" here and use the address of the final base object as the key into the hash table which would be able to safely detect quite a few aliasing issues on the Python side of the fence as well.
But it would also be an over-approximation that could sometimes fail unexpectedly if it does not consider e.g. splitting a large array into multiple non-overlapping slices which can then be borrowed separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really like to find out if there are projects out there which break if the over-approximation, i.e. distinct borrows must have distinct root base objects, is taken. Of course, I would like to find out without releasing that code into the wild and waiting for the angry complaints. :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(There is of course another issue where this might not work properly, considering that base objects are not necessarily NumPy arrays themselves (e.g. our PySliceContainer
), there is probably no real guarantee that distinct base objects mean non-overlapping backing memory, e.g. memory map base objects could be distinct while pointing into the same address range.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's practical for us to calculate the start and end addresses of the memory pointed at by the view? If we used a binary-search based structure to store all the these (start, end) tuples we may be able to detect overlaps reasonably efficiently.
(I would hope that the cost of doing these aliasing checks will work out as much cheaper than large array operations!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this sufficient though? Isn't possible that those ranges overlap even for non-overlapping array views due to non-unit strides? For example, have a large arrays with three RGB layers which gets sliced into three separate views where all have basically the same start-to-end range but never overlap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. I would hope that non-unit strides are the exception (so might be able to optimise for the common case). They may turn out to be very common though, e.g. 1-D slices of multidimensional arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear that the general case may be an instance of satisfiability and therefore NP: For two arrays A_1/2
with the same base object and data pointers p_1/2
and dimensionalities D_1,2
and dimensions d_{1/2,i}
and strides s_{1/2,i}
, we need to determine whether the equation
p_1 + s_1,1 * k_1 + ... + s_{1,D_1} k_{D_1} = p_2 + s_2,1 * l_1 + ... + s_{2,D_2} * l_{D_2}
has integer solutions in the domain
k_i in {0, ..., d_{1,i} - 1} for i in {1, ..., D_1}
l_i in {0, ..., d_{2,i} - 1} for i in {1, ..., D_2}
Not sure whether there is any structure in this problem which makes this substantially easier...
One thing that might make this more palatable is that I think that we basically are already in this position, at least w.r.t. thread safety: Python code can send references to arrays to other threads and there is nothing we can do to prevent it, i.e. we already assume that the Python code is written correctly. (Actually, touching the |
Thanks for this. To help get my head around this and confirm that we agree: what's the expectation on well-behaved Python code? I think it's that Python code should never write to an array while Rust holds a borrow to it? Are there other constraints too? |
I think this is basically it, but it is complicated by NumPy views being arrays themselves, e.g. a function implemented in Rust taking Also note, that just having aliasing views lying around is not an issue AFAIU due the Rust compiler never seeing those and hence being unable to make any incorrect assumptions about aliasing for them. The above problem could theoretically even be solved by us by following NumPy slices through to their base objects and including their offsets/dimensions in the dynamic borrow checking - i.e. #274 (comment) - and if we are able to implement this with reasonable efficiency we would really be down to "do not write/read arrays which Rust code has currently borrowed/mutably borrowed". |
Also note that this would imply that we need to remove all accessors providing |
If the accessors are |
Yes, that's true. I just made a quick survey and besides One change that would still be required is that safe cloning methods like |
(Looking at the scheduling, I think that it is not sensible to target this to version 0.16 even if we decide to go for it. I rather think having that a quick rust-numpy release following PyO3 0.16 containing the accumulated maintenance work is preferable. Personally, I would also like to wait for @kngwyu's perspective before putting more work into this branch.) |
Thinking about this some more, I think that going for |
542b587
to
67f8600
Compare
src/borrow.rs
Outdated
Entry::Occupied(entry) => { | ||
let readers = entry.into_mut(); | ||
|
||
let new_readers = readers.wrapping_add(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this can overflow (in other words, I can't imagine any bug that can cause overflow here because it requires at least u32::MAX times operation), but if it happens, I think it's OK to panic!
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use a checked_add
and panic but that would only complicate the code path as the wrapping_add
and the check for <= 0
, i.e. either it was -1
and there is already a writer or it overflowed and there are isize::MAX
readers already handles both cases without additional control flow or panics with the same result that we can have only up to isize::MAX
readers. (The code is also not really my idea, this is basically std's BorrowFlag
implementation copied over here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kngwyu Are you satisfied with the explanation?
Sorry for the delay (I was really overwhelmed by a deadline), and thank you for this patch a lot. |
I agree that changing some NumPy array flags like |
I also thought about still touching the For now, I would prefer to not touch the flags as this abuses their semantics somewhat (for example,
I think we need substantially more than a single bit to store a potentially large number of readers. This together with the brittleness of using another library's still evolving data structures seem to argue against this IMHO. |
I just noticed that this is not correct: The flag itself would always be modified while holding the GIL and only afterwards would This has the nice side effect of making the dynamic borrow checking here better: Having a |
67f8600
to
51c0cd2
Compare
I understand and agree to keep things minimal here. Having a different API for manipulating flags might be better, just in case users want to explicitly do it. |
51c0cd2
to
5a8b855
Compare
Considering how to proceed: I suggest that after releasing 0.16, we try to get this into shape w.r.t. documentation and tests, but keeping the over-approximation (i.e. borrows work on the level of the base address) and refine that in follow-up PR, maybe first checking start and end of the memory ranges for overlap as @davidhewitt suggested and then fully considering the strides to allow interleaving but non-overlapping views, hopefully in time for 0.17. |
8a072e4
to
b945b14
Compare
@davidhewitt @kngwyu I think this is ready for review now after adding tests and documentation. I also updated the cover letter of the PR including the plan to close #258 by this and track refining the conflict computations in a separate issue. |
Because I felt bad for deprecating the The results seem to support the decision to deprecate the module:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think overall I'm satisfied that this is probably the best approach for now. The unsafe
primitives continue to exist for users who have more esoteric arrays with non-unit strides etc.
One thing to think about which I don't think I've seen yet in this discussion, what's the stated interaction of this with the GIL?
E.g. if we borrow one of these arrays and then release the Python GIL, are other Python threads allowed to mutate the array before our Rust thread then continues? This may have potential further issues around aliasing.
Existing code using
While the global hash table is protected by the GIL, the borrows are not bound to, i.e. they will stay active even if the GIL is released, c.f. the test case #[test]
fn borrows_span_threads() {
Python::with_gil(|py| {
let array = PyArray::<f64, _>::zeros(py, (1, 2, 3), false);
let _exclusive = array.readwrite();
let array = array.to_owned();
py.allow_threads(move || {
let thread = spawn(move || {
Python::with_gil(|py| {
let array = array.as_ref(py);
let _exclusive = array.readwrite();
});
});
assert!(thread.join().is_err());
});
});
} I will add a comment on this to the module-level documentation.
Unchecked code - Python, unsafe Rust, C, etc. - running on other threads is able to mutate those arrays. But there is nothing we can do to stop unchecked doing this and it would be a data race whether our code is written in Rust or something else entirely, i.e. this is incorrect even without the additional requirements Rust places on references. I tried to explain this by //! We can also not prevent unchecked code from concurrently modify an array via callbacks or using multiple threads,
//! but that would lead to incorrect results even if the code that is interfered with is implemented in another language
//! which does not require aliasing discipline. in the module-level documentation. Do you think this should be extended and/or improved? |
b28f255
to
fca2979
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making this push forward!
I had just one comment on the document. I'll double-check examples if I have some time.
I added at least an example benchmark for the difference between using NumPy's iterators via our wrapper and ndarray's iterators to zip together three arrays for setting one to the element-wise sum of the others.
Yeah, I think the decision is reasonable but isn't it the scope of this PR?
src/borrow.rs
Outdated
@@ -0,0 +1,579 @@ | |||
//! Types to safely create references into the interior of the NumPy arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document is well written and I enjoyed reading it. However, I fear this can be a bit overwhelming for beginners. So, I propose to re-organize the document in a way like:
## Summary
The position we take, and the resulting design.
## Some background/reasoning about the design
The reason why this design is useful/reasonable is considering the Rust/Python boundary.
## Current limitation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed an updated version where I tried to restructure the module-level documentation along the proposed lines.
Indeed. Since it really just adds the two benchmarks, I will take the liberty of cherry-picking the commit onto |
9fb0101
to
b46e8ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely - I think we're pretty much there now. Thank you for working on this (and the follow on), I think this is an exciting step for rust-numpy!
src/borrow.rs
Outdated
//! which does not require aliasing discipline. | ||
//! | ||
//! Concerning multi-threading in particular: While the GIL needs to be acquired to create borrows, they are not bound to the GIL | ||
//! and will stay active after the GIL is released, for example by calling [`allow_threads`][pyo3::Python::allow_threads]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that many Python API calls will eventually lead to GIL release too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly, I mainly added that example to have something to link to where the reader can follow up on GIL-related questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly, I mainly added that example to have something to link to where the reader can follow up on GIL-related questions.
@@ -0,0 +1,586 @@ | |||
//! Types to safely create references into NumPy arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document is filled with excellent detail 👍
If I could suggest one final improvement, I think that we can rearrange the order of content just so that users who want to understand what the module does get that information presented to them at the top.
I think it's as simple as adding a short paragraph after this one liner just listing the uses (e.g. obtaining ndarray views, iteration etc.) and moving the Examples
section above Rationale
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to implement your suggestions, please have another look.
This PR is getting a bit overwhelming w.r.t. the amount of comments accumulated, but I think there is also the still-open question at the end of #274 (comment) to resolve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, I think we should proceed to merge! I think the doc is great now and I'm sure it will evolve further as users ask questions and the overlap resolution is implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this is surely only the start of handling dynamic borrow checking of arrays and external users will surely bring an more expansive perspective on this. I think that at least, there should be no regressions for existing usage patterns involving PyReadonlyArray
and the unsafe variant of as_array_mut
.
Sorry I was on a short trip. The refined and reordered document looks really nice to me. |
This PR tries to follow up on the position taken by the cxx crate: C++ or Python code is unsafe and therefore needs to ensure that relevant invariants are upheld. Safe Rust code can then rely on this for e.g. aliasing discipline as long as it does not introduce any memory safety violations itself.
Hence, this PR adds dynamic borrow checking which ensures that safe Rust code will uphold aliasing discipline as long as the Python does so as well. This does come with the cost of two accesses to a global hash table per dereference which is surely not negligible, but also constant, i.e. it does not increase with the number of array elements. It can also be avoided by calling the unsafe/unchecked variants of the accessors when it really is a performance issue.
While I think this PR solves #258 in a manner that is as safely as possible when interacting with unchecked language like Python, I would open another issue immediately after merging this into
main
to track refining the over-approximation of considering all views into the same base object as conflict to detect non-overlap and interleaving. But I prefer like to do this as separate PR to keep this reviewable.Closes #258