Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make gil tests not racy #2190

Merged
merged 2 commits into from
Mar 3, 2022
Merged

Make gil tests not racy #2190

merged 2 commits into from
Mar 3, 2022

Conversation

mejrs
Copy link
Member

@mejrs mejrs commented Feb 26, 2022

Changes the refcount checking to happen from a thread that's actually holding the GIL.

Per #2181 (comment)

@adamreichold
Copy link
Member

While the extend test case looks basically sound to me, it is also quite complex. I wonder whether these test cases are special enough to be put into a separate integration test so that it would be known that only test case is executed in the process and the previous much simpler implementation could be used?

(This would also close some more theoretical objects, for example in the current implementation, another test case could acquire the GIL between this test case releasing it and the background thread created by it acquiring thereby masking if one or more acquisitions of the GIL happened between the synchronization points.)

(If a single integration test per test case is too much, maybe it could be a an integration test with only a few test cases all of which hold a common mutex for their whole duration thereby having single-threaded test execution independently of the number of test threads.)

@mejrs
Copy link
Member Author

mejrs commented Feb 26, 2022

While the extend test case looks basically sound to me

These tests don't use unsafe anymore, what do you mean with "unsound"?

I wonder whether these test cases are special enough to be put into a separate integration test

I don't think they are any more or less special than the other tests so I'm not sure. Fwiw, test_clone_in_other_thread accesses POOL directly, so it can only be written in gil.rs.

(This would also close some more theoretical objects, for example in the current implementation, another test case could acquire the GIL between this test case releasing it and the background thread created by it acquiring thereby masking if one or more acquisitions of the GIL happened between the synchronization points.)

Other threads shouldn't be able to interfere with these tests, I think I got the synchronization right so that the GIL is always held during the critical sections (I hope I didn't mess up).

@adamreichold
Copy link
Member

These tests don't use unsafe anymore, what do you mean with "unsound"?

I did not mean the technical term "soundness" w.r.t. the type system but rather the colloquial form just meaning "correct."

Other threads shouldn't be able to interfere with these tests, I think I got the synchronization right so that the GIL is always held during the critical sections (I hope I didn't mess up).

I also think the synchronization is correct insofar the GIL will have been dropped after 2. and re-acquired after 3., but I think it is still possible that another thread might acquire and release the GIL between 2. and 3. so that 3. is not really checking that acquiring the GIL once will immediately update the reference. (As said, I think this is a rather theoretical concern as it would only mask a failure mode where the GIL would need to be acquired multiple times instead of just once to update the reference counts. And mask it unreliably at that.)

@adamreichold
Copy link
Member

adamreichold commented Feb 26, 2022

Hhhmmm, I think that the test_clone_in_other_thread test case is susceptible to interference from other tests, i.e. the assertions

// The pointer should appear once in the incref pool, and once in the
// decref pool (for the clone being created and also dropped)
assert_eq!(&*POOL.pointer_ops.lock().0, &[ptr]);
assert_eq!(&*POOL.pointer_ops.lock().1, &[ptr]);

and

// Acquiring the gil clears the pool
assert!(POOL.pointer_ops.lock().0.is_empty());
assert!(POOL.pointer_ops.lock().1.is_empty());

and probably too strict as other test threads could use POOL while the test is running (this is probably even less likely than the race which led us here). I guess that could be fixed by checking of presence and absence of ptr instead of comparing with a single-element slice and checking for emptiness.

Which brings me back to thinking that isolating these tests within a separate process would really simplify this stuff. Of course, I also do not want to make what is essentially a bug fix unnecessarily complex and would not oppose merging this as-is.

One more idea to make process isolation simpler: Instead of integration tests, we could "just" execute the test binary using std::process::Command with a filter for the current test case and some environment variable set (to indicate to the test case that it is running within an isolated process and should do the actual test logic instead of spawning the subprocess). Of course, spawning the subprocess with the same flags as ourselves but the additional filter in a portable manner is probably a bit more than "just"...

@mejrs
Copy link
Member Author

mejrs commented Feb 26, 2022

I think that the test_clone_in_other_thread test case is susceptible to interference from other tests,

Hmm, that is true...

I also do not want to make what is essentially a bug fix unnecessarily complex and would not oppose merging this as-is.

Right, that's what I'm thinking too. I just don't want to give the impression (to anyone looking at the tests) that reading the refcount while not holding the GIL is okay.

@davidhewitt
Copy link
Member

What's the plan here? Sounds like we didn't want to overcomplicate... should I merge as-is?

@adamreichold
Copy link
Member

adamreichold commented Feb 28, 2022

What's the plan here? Sounds like we didn't want to overcomplicate... should I merge as-is?

Almost. I think we should still fix the assertions on the content of POOL to just use contains(ptr) to be robust against interference from other test cases. And then merge as-is without doing any fundamental changes to how these tests work.

@mejrs mejrs merged commit 39534ba into PyO3:main Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants