Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race Condition in controllerManager.waitForRunnable #1442

Closed
AliMohammadiSoumee opened this issue Mar 22, 2021 · 12 comments
Closed

Race Condition in controllerManager.waitForRunnable #1442

AliMohammadiSoumee opened this issue Mar 22, 2021 · 12 comments
Assignees

Comments

@AliMohammadiSoumee
Copy link

AliMohammadiSoumee commented Mar 22, 2021

I got a race condition error when I was trying to run my tests with -race flag.

The problem is with controllerManager.waitForRunnable. Based on this document, Add and Wait methods of WaitGroup should be called in the same goroutine.

controllerManager calls the WaitGroup.Add method In the controllerManager.StartRunnable method, and calls the WaitGroup.Wait method in the controllerManager.waitForRunnableToEnd method, While waitForRunnableToEnd and StartRunnable are running in different goroutines.

This log may help you, It's for v0.8.3:

WARNING: DATA RACE
Write at 0x00c00038e6a8 by goroutine 86:
  internal/race.Write()
      /usr/local/go/src/internal/race/race.go:41 +0x125
  sync.(*WaitGroup).Wait()
      /usr/local/go/src/sync/waitgroup.go:128 +0x126
  sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).waitForRunnableToEnd.func2()
      /home/ali/Developer/Go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:567 +0x44

Previous read at 0x00c00038e6a8 by goroutine 166:
  internal/race.Read()
      /usr/local/go/src/internal/race/race.go:37 +0x206
  sync.(*WaitGroup).Add()
      /usr/local/go/src/sync/waitgroup.go:71 +0x219
  sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startRunnable()
      /home/ali/Developer/Go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:678 +0x4e
  sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).serveMetrics()
      /home/ali/Developer/Go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:384 +0x31b

Goroutine 86 (running) created at:
  sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).waitForRunnableToEnd()
      /home/ali/Developer/Go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:566 +0xcb
  sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure()
      /home/ali/Developer/Go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:548 +0x376
  sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).Start.func1()
      /home/ali/Developer/Go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:449 +0x46
  sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).Start()
      /home/ali/Developer/Go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:499 +0x575

Goroutine 166 (running) created at:
  sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).Start()
      /home/ali/Developer/Go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/internal.go:473 +0x61b
@jpfourny
Copy link

jpfourny commented May 7, 2021

Yeah - the usage of WaitGroup is sketchy. It's true that Wait and Add are not safe to be called concurrently. This is a ticking time-bomb.

@jpfourny
Copy link

jpfourny commented May 7, 2021

What I don't understand is why -race is detecting the problem consistently in 0.8.3, but not back in 0.6.5. The usage patterns of WaitGroup did not change drastically.

@MadhavJivrajani
Copy link
Contributor

/assign

@brakthehack
Copy link

FWIW, this is still present in v0.9.5.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 10, 2021
@clementnuss
Copy link

/remove-lifecycle stale

not to get this issue closed: the race condition is still present with v0.10.3

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 24, 2021
@clementnuss
Copy link

is this issue still valid with controller-runtime v0.11.0 ?
I tried to reproduce the bug but it never happened again so far (though it did not happen deterministically)

@FillZpp
Copy link
Contributor

FillZpp commented Dec 17, 2021

I think it has been solved in #1695 .

@clementnuss
Copy link

@AliMohammadiSoumee @brakthehack can you guys try to reproduce your race condition with controller-runtime prior v0.11 and then with v0.11 ?

I cannot reproduce it with v0.11 personally, but it had only happened a few times
If not i think this issue could be closed

@AliMohammadiSoumee
Copy link
Author

I have reproduced it with controller-runtime v0.11.0, and apparently, the issue is solved.

@FillZpp
Copy link
Contributor

FillZpp commented Dec 23, 2021

/close

@k8s-ci-robot
Copy link
Contributor

@FillZpp: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants