-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make API globals thread safe using atomics #222
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 good spot - one further quick thought by me...
Python::with_gil(|py| { | ||
let mut api = self.api.load(Ordering::Relaxed) as *const *const c_void; | ||
if api.is_null() { | ||
api = get_numpy_api(py, MOD_NAME, CAPSULE_NAME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a potential gotcha, can get_numpy_api
lead to temporary release of the GIL lock? That would potentially enable multiple threads to run this initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As CPython's import implementation can be hooked, I think one cannot prevent this from happening in general. But I also think that multiple threads performing the initialization is only an issue of efficiency.
If a hook is releasing the GIL for whatever reason, it needs to be reacquired and all threads will only progress back here with the GIL held and at most store the same capsule pointer redundantly. (Doing the double-checking here on my part was only motivated by efficiency, i.e. we already have to take the lock so why not use this to avoid redundant initialization as we are already on the slow path.)
(If multiple threads importing the same module yields a different capsule and hence API pointer, I think all bets are off and we would need external synchronization like using std::sync::Once
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(If multiple threads importing the same module yields a different capsule and hence API pointer, I think all bets are off and we would need external synchronization like using std::sync::Once.)
Well we could compare_exchange
the pointer instead of store
ing it and only update it if it still NULL and otherwise discard our just initialized value in favour of the "old" one returned by compare_exchange
.
But having the global at all seems weird if we are expecting that the get_numpy_api
returns different capsules when called from different threads or at different times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I think most likely this code is fine as-is thanks to the global nature. In GILOnceCell
I chose to drop any surplus values produced by other threads if a race occurred. This was kind of necessary because of the API contract of it being write-once.
While the GIL is held when the API pointer is updated, this can still race with other threads checking the current value of the API pointer (without holding the GIL) and should therefore using atomics. The loads and stores are performed using acquire-release semantics as we want to dereference the pointer and hence any stores to the referenced memory need to be visible to us. The get function should also be unsafe as the offset it uses cannot be verified which might create an invalid pointer invoking undefined behaviour as per the contract of pointer::offset. Finally, the initialization code is moved into a separate cold function to improve code locality for the fast path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me, thanks!
While the GIL is held when the API pointer is updated, this can still race with other threads checking the current value of the API pointer (without holding the GIL) and should therefore using atomics.
The loads and stores are performed using acquire-release semantics as we want to dereference the pointer and hence any stores to the referenced memory need to be visible to us.
The get function should also be unsafe as the offset it uses cannot be verified which might create an invalid pointer invoking undefined behaviour as per the contract of
pointer::offset
.Finally, the initialization code is moved into a separate cold function to improve code locality for the fast path.
I suspect that even on strongly ordered architectures like x86-64 this might have some performance impact via inhibiting compiler optimizations but I also do not see how the current
Cell
-based implementation can actually beSync
?