Make API globals thread safe using atomics #222

adamreichold · 2021-11-23T18:32:12Z

While the GIL is held when the API pointer is updated, this can still race with other threads checking the current value of the API pointer (without holding the GIL) and should therefore using atomics.

The loads and stores are performed using acquire-release semantics as we want to dereference the pointer and hence any stores to the referenced memory need to be visible to us.

The get function should also be unsafe as the offset it uses cannot be verified which might create an invalid pointer invoking undefined behaviour as per the contract of pointer::offset.

Finally, the initialization code is moved into a separate cold function to improve code locality for the fast path.

I suspect that even on strongly ordered architectures like x86-64 this might have some performance impact via inhibiting compiler optimizations but I also do not see how the current Cell-based implementation can actually be Sync?

davidhewitt

👍 good spot - one further quick thought by me...

src/npyffi/array.rs

davidhewitt · 2021-11-23T18:49:45Z

src/npyffi/array.rs

+        Python::with_gil(|py| {
+            let mut api = self.api.load(Ordering::Relaxed) as *const *const c_void;
+            if api.is_null() {
+                api = get_numpy_api(py, MOD_NAME, CAPSULE_NAME);


As a potential gotcha, can get_numpy_api lead to temporary release of the GIL lock? That would potentially enable multiple threads to run this initialization.

As CPython's import implementation can be hooked, I think one cannot prevent this from happening in general. But I also think that multiple threads performing the initialization is only an issue of efficiency.

If a hook is releasing the GIL for whatever reason, it needs to be reacquired and all threads will only progress back here with the GIL held and at most store the same capsule pointer redundantly. (Doing the double-checking here on my part was only motivated by efficiency, i.e. we already have to take the lock so why not use this to avoid redundant initialization as we are already on the slow path.)

(If multiple threads importing the same module yields a different capsule and hence API pointer, I think all bets are off and we would need external synchronization like using std::sync::Once.)

(If multiple threads importing the same module yields a different capsule and hence API pointer, I think all bets are off and we would need external synchronization like using std::sync::Once.)

Well we could compare_exchange the pointer instead of storeing it and only update it if it still NULL and otherwise discard our just initialized value in favour of the "old" one returned by compare_exchange.

But having the global at all seems weird if we are expecting that the get_numpy_api returns different capsules when called from different threads or at different times.

Agreed, I think most likely this code is fine as-is thanks to the global nature. In GILOnceCell I chose to drop any surplus values produced by other threads if a race occurred. This was kind of necessary because of the API contract of it being write-once.

While the GIL is held when the API pointer is updated, this can still race with other threads checking the current value of the API pointer (without holding the GIL) and should therefore using atomics. The loads and stores are performed using acquire-release semantics as we want to dereference the pointer and hence any stores to the referenced memory need to be visible to us. The get function should also be unsafe as the offset it uses cannot be verified which might create an invalid pointer invoking undefined behaviour as per the contract of pointer::offset. Finally, the initialization code is moved into a separate cold function to improve code locality for the fast path.

davidhewitt

Looks reasonable to me, thanks!

davidhewitt reviewed Nov 23, 2021

View reviewed changes

src/npyffi/array.rs Show resolved Hide resolved

src/npyffi/array.rs Outdated Show resolved Hide resolved

davidhewitt reviewed Nov 23, 2021

View reviewed changes

davidhewitt approved these changes Nov 23, 2021

View reviewed changes

davidhewitt merged commit 6d6084f into PyO3:main Nov 25, 2021

adamreichold deleted the sync-api-globals branch November 25, 2021 21:57

adamreichold mentioned this pull request Jan 6, 2022

Bump dependency on cfg-if #240

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make API globals thread safe using atomics #222

Make API globals thread safe using atomics #222

adamreichold commented Nov 23, 2021

davidhewitt left a comment

davidhewitt Nov 23, 2021

adamreichold Nov 23, 2021

adamreichold Nov 23, 2021 •

edited

Loading

davidhewitt Nov 23, 2021

davidhewitt left a comment

Make API globals thread safe using atomics #222

Make API globals thread safe using atomics #222

Conversation

adamreichold commented Nov 23, 2021

davidhewitt left a comment

Choose a reason for hiding this comment

davidhewitt Nov 23, 2021

Choose a reason for hiding this comment

adamreichold Nov 23, 2021

Choose a reason for hiding this comment

adamreichold Nov 23, 2021 • edited Loading

Choose a reason for hiding this comment

davidhewitt Nov 23, 2021

Choose a reason for hiding this comment

davidhewitt left a comment

Choose a reason for hiding this comment

adamreichold Nov 23, 2021 •

edited

Loading