Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start health checks early. #1319

Merged
merged 1 commit into from
Apr 8, 2017
Merged

Start health checks early. #1319

merged 1 commit into from
Apr 8, 2017

Conversation

timoreimann
Copy link
Contributor

Do not wait a full tick cycle to execute the first health check.

Additional changes:

  • Make request timeout configurable (for testing purposes).
  • Support synchronizing on health check goroutine termination through an internal wait group (for testing purposes).
  • Stop leaking by closing the HTTP response body.
  • Extend health check logging and use WARNING level for (continuously) failing health checks.

@timoreimann
Copy link
Contributor Author

@dtomcej feel free to continue where you left off. :-)

@timoreimann timoreimann added this to the 1.3 milestone Mar 20, 2017
@timoreimann timoreimann force-pushed the start-healthcheck-early branch from 7c78105 to 24ed5ed Compare March 24, 2017 10:45
@timoreimann timoreimann added the kind/enhancement a new or improved feature. label Mar 24, 2017
@emilevauge
Copy link
Member

I would prefer to not modify the code for testing purpose only.
It could lead to confusion/issue in the future.
Can with think of another solution for testing?

@timoreimann
Copy link
Contributor Author

We need some kind of synchronization point to make sure all health check goroutines have completed by the time the test asserts the results. Without modifying the code, the only alternative I see is to wait a constant amount of time and hope that all requests have finished. However, that leads to both longer-than-necessary test runtimes and a potential source of flakiness, both of which I consider worse that the addition of a wait group.

I tried to use the injected loadBalancer to signal when the health checks are complete, but it turned out we can't use that because not all test paths call methods on the loadBalancer (e.g., "healthy server staying healthy"). I'm at a point where I think it might be possible to make it work, but only at the price of a much more complicated test implementation.

Thus, a handful of extra prod code to simplify testing seems justified. I've made sure the wait group is commented and unexported so code outside of the healthcheck package won't be able to access it.

WDYT?

@emilevauge
Copy link
Member

Ouch, not sure on this, sorry.
Designing the code to make it easily testable, I'm OK with that. But adding test parts directly into the code is a whole different story. Especially sync stuff ^^
@containous/traefik WDYT ?

@timoreimann timoreimann force-pushed the start-healthcheck-early branch from 24ed5ed to cb54814 Compare April 7, 2017 11:45
@timoreimann
Copy link
Contributor Author

timoreimann commented Apr 7, 2017

@emilevauge I think I found a way: Did a bit of refactoring so that I can start the health-checking goroutine concurrently from the test; that way, I can also define the wait group in the test.

The downside of this is that I'm not strictly testing the public API anymore but an unexported function (i.e., we start testing further down the stack). But that's not the worst either.
You never get a lunch for free. :-)

Let me know what you think.

(PS: Rebased too.)

Copy link
Member

@emilevauge emilevauge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @timoreimann !
I love the way your refactored this.
LGTM :)

@emilevauge
Copy link
Member

Could you rebase again ?

@timoreimann timoreimann force-pushed the start-healthcheck-early branch from cb54814 to 60ea919 Compare April 7, 2017 17:12
Do not wait a full tick cycle to execute the first health check.

Additional changes:

- Make request timeout configurable (for testing purposes).
- Support synchronizing on health check goroutine termination through an
  internal wait group (for testing purposes).
- Stop leaking by closing the HTTP response body.
- Extend health check logging and use WARNING level for (continuously)
  failing health checks.
@timoreimann
Copy link
Contributor Author

@emilevauge squashed and rebased.

@vdemeester vdemeester merged commit 0cc3d05 into master Apr 8, 2017
@emilevauge emilevauge deleted the start-healthcheck-early branch April 12, 2017 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/healthcheck kind/enhancement a new or improved feature.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants