-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Reloading of server certificates for server endpoints #4554
Comments
Jaeger does not control expiration time of the certificates. "Reloading" simply means if the cert files are changed on disk then Jaeger will detect that and start using the new files. It is up to you to ensure that the certs are timely rotated on disk. |
Hi @yurishkuro, The certs are timely rotated on disk and the new files are being used by the Jaeger thrift interface i.e. available on 14268 port, grpc interface i.e. available on 14250 port and also for the zipkin thrift interface. But the new files are not being used by the OTLP grpc and http interfaces that are available on 4317 and 4318 ports respectively. openssl s_client --connect HOST:PORT| openssl x509 -noout -dates If I execute the above command multiple times with 14250 and 14268 ports, then the validity keeps on changing. But If I execute the above command multiple times with 4317 and 4318 ports. then the validity is not changing. So is there any limitation wrt the OTLP grpc and http intefaces as far as using the renewed server certificates are concerned? |
You're correct, I just looked at the code and while we have a common infra for reloading certs (it was added fairly recently), only certain places are using hot-reloading (e.g. ES Client change is still pending #4342). The gRPC and HTTP servers do not, they only read the certs on start-up. I will change this into a feature request. |
I want to help in this issue. |
@sidharthabiswal how to do below step ? Reduce the renewal time of the server certificates to a lower value say 3 minutes |
Hi @yurishkuro, Could you please let us know that in which version of Jaeger the fix will be available? |
Hi @yurishkuro, if I am understanding the problem correctly, the code to achieve the following has to sit in [edited] |
Hi, @yurishkuro! I would like to work on this issue. Taking issue #3924 and the associated PR #4342 as a reference, should I need to invoke StartGRPCServer (https://github.com/jaegertracing/jaeger/blob/4798446b87d6012c657057ab485e100481239317/cmd/collector/app/server/grpc.go#L54C6-L54C21) for the second time when there is a change in certs? |
@roopeshsn restarting the server seems very destructive approach. Doesn't gRPC support refreshing certs without restarting the server? |
@roopeshsn PR #4342 only aims at updating ES client password. But the package we currently use ( Similarly, you would just need to aim at updating certs in gRPC and HTTP servers. Go for the easiest way possible. I haven't looked at the code of the servers yet, but I'm pretty sure the gRPC and HTTP packages we use have not been deprecated. And updating cert seems like a common feature, so there's high chance the packages already have supports for that. |
Looks like there is a watcher for certs in place jaeger/pkg/config/tlscfg/cert_watcher.go Line 52 in 2119f5f
|
Yes, when the cert changes, gRPC and HTTP servers will need to update that |
There is a difference between recreating a client vs. recreating a server. The former only affects the collector process, and is a relatively normal thing to do (as in, closing outbound connections periodically is even sometimes a recommended procedure). But closing the server affects the other clients connected to the collector, possibly in the middle of them submitting a request. gRPC for go has support for dynamically swapping certificates - related issue grpc/grpc-go#2167 |
Hi @yurishkuro, Is there any plan to include this feature in the upcoming Jaeger version? |
Hi @yurishkuro, Any update on this? |
Hi @yurishkuro and @haanhvu , We have analysed the Jaeger source code and below is our finding. The file i.e. https://github.com/jaegertracing/jaeger/blob/main/cmd/collector/app/server/grpc.go contains the below code in which it calls the Config() defined in https://github.com/jaegertracing/jaeger/blob/main/pkg/config/tlscfg/options.go wherein the certWatcher is initialized. This grpc.go is for the Collector GRPC interface i.e. available on 14250 port.
We observed similar code in https://github.com/jaegertracing/jaeger/blob/main/cmd/collector/app/server/http.go. This http.go is for the Collector HTTP interface i.e. available on 14268 port. But for this issue we have checked https://github.com/jaegertracing/jaeger/blob/main/cmd/collector/app/handler/otlp_receiver.go and the applyTLSSettings() is defined as shown below.
So the certWatcher is not in place for the OTLP GRPC and HTTP interfaces. We tried to implement the certWatcher related code changes in this file. But that did not work as the OTLP GRPC and HTTP servers are implemented in opentelemetry collector where the certificate reload mechanism is missing. We think that some code changes are required in the below file as well. So we would like to know if the code changes are required in both Jaeger and Opentelemetry Collector to resolve this issue. Could you please confirm? |
If OTEL receivers do not support reloading certificates dynamically, then we cannot fix it here, because we just import those receivers as is. Please open a ticket there and link here. |
Hi @yurishkuro, |
looking at otel-collector code, there was a PR open-telemetry/opentelemetry-collector#4737 where TLS reloading was implemented (on a timer, instead of file watcher as we do). The |
Hi @yurishkuro, Thanks for the suggestion. |
We would need to add a CLI flag to our |
Hi @yurishkuro, Thanks for your suggestion. We have modified the Jaeger 3PP code(flags.go and options.go) so that ReloadInterval attribute can be used as a CLI flag. Then we used the ReloadInterval in the below function in otlp_receiver.go file.
After the deployment we have observed that the certificate reload is working fine for the OTEL GRPC and HTTP interfaces as per the configured RELOAD_INTERVAL. Since this issue is related to security, it really helps if this fix is delivered by Jaeger soon and then Jaeger may release the complete solution that works for all the interfaces. Could you please share your opinion? |
Are you planning to open a PR? |
Hi! I would like to work on this issue. @yurishkuro how would you prefer this issue to be resolved?
Or maybe you have a better approach in mind? |
I commented on the PR - my suggestion is to add a capability to Longer term we should converge onto one method, which would be a breaking change, but that would be ok for jaeger-v2 (#4843) |
## Which problem is this PR solving? - Resolves #4554 ## Description of the changes - Enable reload interval functionality to OTel server certificates by adding `otlp.http.tls.reload-interval` and `otlp.grpc.tls.reload-interval` flags (0s means disabled) ## How was this change tested? - Tested manually in local - Added `*.tls.reload-interval` flag unit tests to `flags.go` and `otel_receiver.go` ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `yarn lint` and `yarn test` --------- Signed-off-by: James Ryans <[email protected]>
What happened?
The server certificates used by the OTLP grpc and http interfaces(available on 4317 and 4318 ports) are not getting reloaded.
Steps to reproduce
remote error: tls: expired certificate
Expected behavior
Server certificates used by OTLP grpc and http interfaces should be reloaded.
Relevant log output
No response
Screenshot
No response
Additional context
No response
Jaeger backend version
1.42.0
SDK
No response
Pipeline
No response
Stogage backend
No response
Operating system
No response
Deployment model
No response
Deployment configs
No response
The text was updated successfully, but these errors were encountered: