-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestDeadlock fails intermittently in local environment with go test -race
s
#508
Comments
my 2c: extend the timeout. a deadlock would still be detected. |
Tiexin and I rubber-ducked this a bit, and it's not clear what's causing the issue, but it may be a real problem. I had thought this was unrelated, but it's likely related to the "allow stopping services in 'starting' state" change we just merged. We've already determined that increasing that "timed out waiting for final request" timeout from 1s to 5s doesn't help (and this is not on a loaded machine), so there's clearly something going on. It could be a buggy test, or it could be buggy code, so let's dig further. Tiexin's going to 1) try to reproduce the failure before fix #503, and 2) add logging to try to determine where the problem is. |
After some debugging and testing, I figured out that it was a deadlock issue with Ringbuffer: Ringbuffer It can be reproduced with a test like this: func BenchmarkRingBufferDeadlock(b *testing.B) {
payload := []byte("foobar")
var rb *servicelog.RingBuffer
ctx, cancel := context.WithCancel(context.Background())
go func() {
for ctx.Err() == nil {
fmt.Println("A")
rb = servicelog.NewRingBuffer(10 * len(payload))
_ = rb.HeadIterator(0)
time.Sleep(time.Duration(rand.Intn(50)) * time.Millisecond)
}
}()
go func() {
for ctx.Err() == nil {
if rb != nil {
fmt.Println("B")
rb.Close()
}
time.Sleep(time.Duration(rand.Intn(50)) * time.Millisecond)
}
}()
time.Sleep(10 * time.Second)
cancel()
} And after instrumenting ringbuffer with some logs for locking/unlocking, we can observe the deadlock: B
=== in Close, rwlock locked
=== in Close, iteratorMutex locked
=== in Close, iteratorMutex unlocked
=== Close done!
=== in Close, rwlock unlocked
A
=== in HeadIterator, iteratorMutex locked
=== in Closed, rwlock locked
=== in Closed, rwlock unlocked
=== HeadIterator done!
=== in HeadIterator, iteratorMutex unlocked
A
=== in HeadIterator, iteratorMutex locked
=== in Closed, rwlock locked
=== in Closed, rwlock unlocked
=== HeadIterator done!
=== in HeadIterator, iteratorMutex unlocked
A
B
=== in Close, rwlock locked
=== in HeadIterator, iteratorMutex locked
(hangs here) |
…rting state (#510) In [this PR](#503), we introduced a feature to allow stopping services that are in the starting state / within the okayDelay. However, this makes [a deadlock issue in ringbuffer a real issue](#508). So we are reverting this change now and will redo it after the deadlock is resolved.
Previously, we introduced [a fix to allow stopping services in the starting state](#503). Because of this fix, we discovered [another deadlock issue](#508), so we rolled it back. Now that the deadlock issue is [fixed by this PR](https://github.com/canonical/pebble/pull/511/files), we are reintroducing the fix about "allowing stopping services in the starting state". Benchmark test, manual test to reproduce the 3-way deadlock issue, and race test are all done and passed.
We have a test case
TestDeadlock
ininternals/daemon/api_services_test.go
where we randomly issue a bunch of pebble start/stop commands (c.a. 50 in total or so) within a second and expect all the changes to be done. This test case fails intermittently withgo test -race
because not all changes can be ready in certain cases. In my local Ubuntu 22.04 machine it's relatively easy to reproduce, failing once every five times or so, but weirdly it never occurs in GitHub Action runs.Error message:
I tried to use the Go client to do the same as the test case with a manually started Pebble then check the changes' status but I didn't reproduce this issue successfully. Code used:
The text was updated successfully, but these errors were encountered: