-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics for backend_retries_total #1504
Conversation
I will fix the build tomorrow morning, please don't hesitate to have a look at the PR already. That happens when you only execute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I work with Marco, we have reviewed the PR internally over the last few days. Hence, I think it makes sense if I give my LGTM. :-)
Welcome and thanks for your contribution! |
Thanks for the welcome and for the introduction @timoreimann. I should probably also have said a couple of words :D I am happy to be able to contribute to the project. Linting errors are fixed by the way. |
bca47c6
to
dd93e3e
Compare
Hi @vdemeester and @emilevauge. Can you have a look at this PR? @timoreimann told me that one of you has to give the approve the PR in order to get it merged :) |
dd93e3e
to
4f2ae4a
Compare
4f2ae4a
to
3df3265
Compare
I just updated PR against latest master again. @vdemeester or @emilevauge can you have a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @marco-jantke, thanks a lot for your contribution!
I made few comments, I also what has been done in server.go
could be improved a bit :)
middlewares/prometheus.go
Outdated
if err != nil { | ||
e, ok := err.(stdprometheus.AlreadyRegisteredError) | ||
if !ok { | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, don't use panic, return an error instead. You shouldn't stop a reverse proxy because of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I rebuilt it now in a way that if any of the Prometheus Metrics can not be registered, an error will be returned and the *Prometheus
will be nil
. This will lead to no Metrics instrumentation and an log message on Error level. Does it make sense to you this way?
middlewares/retry.go
Outdated
} | ||
|
||
// NewRetry returns a new Retry instance | ||
func NewRetry(attempts int, next http.Handler) *Retry { | ||
func NewRetry(attempts int, next http.Handler, metrics RetryMetrics) *Retry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to add an interface here, instead of passing RetryMetrics
directly. WDYT ?
type Listener interface {
Retried(attempt int) error
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually I refrain from building abstractions before they are actually required. Therefore personally I would rather leave this part as is, extending it with a more generic implementation once we have an use-case for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emilevauge WDYT? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry
middleware should be loosely coupled to the metrics
middleware. If I agree with you on not building abstractions too early, here we rather need isolation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see your point and I am fine with implementing the abstraction layer. Though I would like to make sure that we are on the same page with our understanding of the code.
The Retry middleware is not coupled to the metrics middleware. It only knows about the RetryMetrics
interface, which happens to be right now in the middleware package (IMHO this is the rather confusing part). The Metrics
and RetryMetrics
interfaces are in my opinion already a layer of abstraction. The metrics middleware in turn also "only" depends on the Metrics
interface.
Though the abstraction you suggest would remove all knowledge about metrics from the retry middleware. Do I understand your intention correctly? If so and you suggest that way, I will adapt the PR accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though the abstraction you suggest would remove all knowledge about metrics from the retry middleware. Do I understand your intention correctly?
Exactly :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for the feedback. I implemented it now. See 7ff5692146cc1e0f8a804a1ef54e59cfeb4e4fb1
server/server.go
Outdated
@@ -180,7 +180,8 @@ func (server *Server) startHTTPServers() { | |||
serverMiddlewares := []negroni.Handler{server.accessLoggerMiddleware, server.loggerMiddleware, metrics} | |||
if server.globalConfiguration.Web != nil && server.globalConfiguration.Web.Metrics != nil { | |||
if server.globalConfiguration.Web.Metrics.Prometheus != nil { | |||
metricsMiddleware := middlewares.NewMetricsWrapper(middlewares.NewPrometheus(newServerEntryPointName, server.globalConfiguration.Web.Metrics.Prometheus)) | |||
promMetrics, _ := middlewares.NewPrometheus(newServerEntryPointName, server.globalConfiguration.Web.Metrics.Prometheus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you use newMetrics
here too ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed it.
716162c
to
5d9e9ba
Compare
Closing/Reopening to get SemaphoreCI working. |
@emilevauge can you have a look again? :) |
@marco-jantke could you resolve the conflicts? |
7ff5692
to
2383210
Compare
Ok, everything should be ready now. |
2383210
to
9fdad4b
Compare
@emilevauge ping |
return nil | ||
} | ||
|
||
func registerRetryMiddleware( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marco-jantke could you remove this new function? It is called only one time, has too few lines and is then hard to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I basically introduced to enable testing, that the middleware is constructed properly. I think this would not be possible anymore, when I inline the logic back into the loadConfig
. So I can remove it of course if you prefer it this way, but I guess then the test also has to die.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I was not thinking about this ;) It's all good then!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work @marco-jantke
LGTM
9fdad4b
to
48b149f
Compare
Thanks for the reviews and merging 🎉 |
This PR adds metrics for the backend_retries_total. On the way to implement this, I did a couple of clean up tasks and improved the testing coverage of the existing implementation.
Cleanup tasks:
traefik_request_duration_seconds
all the time. Therefore this is no breaking change, just clarification inside the code base.ResponseRecorder
in the retry middleware private, as this is something you would not expect from the package interface to provide.Apart of this I added unit tests for the
retry.go
and the parts I touched in theserver.go
. Especially theserver.go
method is hard to test, due to its current design, but I hope it is a first step and having the file will motivate others to increase the coverage there as well.