Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Reloading of server certificates for server endpoints #4554

Closed
sidharthabiswal opened this issue Jun 27, 2023 · 25 comments · Fixed by #4898
Closed

[Feature]: Reloading of server certificates for server endpoints #4554

sidharthabiswal opened this issue Jun 27, 2023 · 25 comments · Fixed by #4898
Labels
enhancement good first issue Good for beginners help wanted Features that maintainers are willing to accept but do not have cycles to implement

Comments

@sidharthabiswal
Copy link

What happened?

The server certificates used by the OTLP grpc and http interfaces(available on 4317 and 4318 ports) are not getting reloaded.

Steps to reproduce

  1. Reduce the renewal time of the server certificates to a lower value say 3 minutes.
  2. After 3 minutes try to connect to OTLP HTTP 4318 and GRPC 4317 port and then below error will occur.
    remote error: tls: expired certificate
  3. The validity of the server certificate is not changing that can be checked using openssl s_client --connect HOST:PORT | openssl x509 -noout -dates

Expected behavior

Server certificates used by OTLP grpc and http interfaces should be reloaded.

Relevant log output

No response

Screenshot

No response

Additional context

No response

Jaeger backend version

1.42.0

SDK

No response

Pipeline

No response

Stogage backend

No response

Operating system

No response

Deployment model

No response

Deployment configs

No response

@yurishkuro
Copy link
Member

Jaeger does not control expiration time of the certificates. "Reloading" simply means if the cert files are changed on disk then Jaeger will detect that and start using the new files. It is up to you to ensure that the certs are timely rotated on disk.

@yurishkuro yurishkuro added question and removed bug labels Jun 27, 2023
@sidharthabiswal
Copy link
Author

Hi @yurishkuro, The certs are timely rotated on disk and the new files are being used by the Jaeger thrift interface i.e. available on 14268 port, grpc interface i.e. available on 14250 port and also for the zipkin thrift interface.

But the new files are not being used by the OTLP grpc and http interfaces that are available on 4317 and 4318 ports respectively.

openssl s_client --connect HOST:PORT| openssl x509 -noout -dates

If I execute the above command multiple times with 14250 and 14268 ports, then the validity keeps on changing.

But If I execute the above command multiple times with 4317 and 4318 ports. then the validity is not changing.

So is there any limitation wrt the OTLP grpc and http intefaces as far as using the renewed server certificates are concerned?

@yurishkuro
Copy link
Member

yurishkuro commented Jun 27, 2023

You're correct, I just looked at the code and while we have a common infra for reloading certs (it was added fairly recently), only certain places are using hot-reloading (e.g. ES Client change is still pending #4342). The gRPC and HTTP servers do not, they only read the certs on start-up. I will change this into a feature request.

@yurishkuro yurishkuro changed the title [Bug]: Server certificates are not getting reloaded for OTLP interfaces [Feature]: Reloading of server certificates for server endpoints Jun 27, 2023
@yurishkuro yurishkuro added enhancement help wanted Features that maintainers are willing to accept but do not have cycles to implement good first issue Good for beginners and removed question labels Jun 27, 2023
@spy-1234
Copy link

I want to help in this issue.

@shubmjagtap
Copy link

@sidharthabiswal how to do below step ?

Reduce the renewal time of the server certificates to a lower value say 3 minutes

@sidharthabiswal
Copy link
Author

Hi @yurishkuro, Could you please let us know that in which version of Jaeger the fix will be available?

@pranoyk
Copy link

pranoyk commented Jul 18, 2023

Hi @yurishkuro, if I am understanding the problem correctly, the code to achieve the following has to sit in cmd/collector/app/server/grpc.go and cmd/collector/app/server/http.go or handle it directly in pkg/config/tlscfg/options.go and modify the grpc and HTTP files in order to use them?

[edited]
Noticed that changes required in options files are already in place, will only have to modify the gRPC and HTTP servers.

@roopeshsn
Copy link

roopeshsn commented Aug 18, 2023

Hi, @yurishkuro! I would like to work on this issue. Taking issue #3924 and the associated PR #4342 as a reference, should I need to invoke StartGRPCServer (https://github.com/jaegertracing/jaeger/blob/4798446b87d6012c657057ab485e100481239317/cmd/collector/app/server/grpc.go#L54C6-L54C21) for the second time when there is a change in certs?

@yurishkuro
Copy link
Member

@roopeshsn restarting the server seems very destructive approach. Doesn't gRPC support refreshing certs without restarting the server?

@haanhvu
Copy link
Contributor

haanhvu commented Aug 18, 2023

Hi, @yurishkuro! I would like to work on this issue. Taking issue #3924 and the associated PR #4342 as a reference, should I need to invoke StartGRPCServer (https://github.com/jaegertracing/jaeger/blob/4798446b87d6012c657057ab485e100481239317/cmd/collector/app/server/grpc.go#L54C6-L54C21) for the second time when there is a change in certs?

@roopeshsn PR #4342 only aims at updating ES client password. But the package we currently use (github.com/olivere/elastic, which has been deprecated) doesn't support that. That's why I need to swap the whole client^^

Similarly, you would just need to aim at updating certs in gRPC and HTTP servers. Go for the easiest way possible. I haven't looked at the code of the servers yet, but I'm pretty sure the gRPC and HTTP packages we use have not been deprecated. And updating cert seems like a common feature, so there's high chance the packages already have supports for that.

@roopeshsn
Copy link

roopeshsn commented Aug 18, 2023

@roopeshsn restarting the server seems very destructive approach. Doesn't gRPC support refreshing certs without restarting the server?

Looks like there is a watcher for certs in place

func newCertWatcher(opts Options, logger *zap.Logger, rootCAs, clientCAs *x509.CertPool) (*certWatcher, error) {

@haanhvu
Copy link
Contributor

haanhvu commented Aug 18, 2023

@roopeshsn restarting the server seems very destructive approach. Doesn't gRPC support refreshing certs without restarting the server?

Looks like there is a watcher for certs in place

func newCertWatcher(opts Options, logger *zap.Logger, rootCAs, clientCAs *x509.CertPool) (*certWatcher, error) {

Yes, when the cert changes, gRPC and HTTP servers will need to update that

@yurishkuro
Copy link
Member

There is a difference between recreating a client vs. recreating a server. The former only affects the collector process, and is a relatively normal thing to do (as in, closing outbound connections periodically is even sometimes a recommended procedure). But closing the server affects the other clients connected to the collector, possibly in the middle of them submitting a request. gRPC for go has support for dynamically swapping certificates - related issue grpc/grpc-go#2167

@sidharthabiswal
Copy link
Author

Hi @yurishkuro, Is there any plan to include this feature in the upcoming Jaeger version?

@sidharthabiswal
Copy link
Author

Hi @yurishkuro, Is there any plan to include this feature in the upcoming Jaeger version?

Hi @yurishkuro, Any update on this?

@sidharthabiswal
Copy link
Author

sidharthabiswal commented Sep 21, 2023

Hi @yurishkuro and @haanhvu ,

We have analysed the Jaeger source code and below is our finding.

The file i.e. https://github.com/jaegertracing/jaeger/blob/main/cmd/collector/app/server/grpc.go contains the below code in which it calls the Config() defined in https://github.com/jaegertracing/jaeger/blob/main/pkg/config/tlscfg/options.go wherein the certWatcher is initialized. This grpc.go is for the Collector GRPC interface i.e. available on 14250 port.

if params.TLSConfig.Enabled {
		// user requested a server with TLS, setup creds
		tlsCfg, err := params.TLSConfig.Config(params.Logger)
		if err != nil {
			return nil, err
		}

		creds := credentials.NewTLS(tlsCfg)
		grpcOpts = append(grpcOpts, grpc.Creds(creds))
	}

	server = grpc.NewServer(grpcOpts...)

We observed similar code in https://github.com/jaegertracing/jaeger/blob/main/cmd/collector/app/server/http.go. This http.go is for the Collector HTTP interface i.e. available on 14268 port.

But for this issue we have checked https://github.com/jaegertracing/jaeger/blob/main/cmd/collector/app/handler/otlp_receiver.go and the applyTLSSettings() is defined as shown below.

func applyTLSSettings(opts *tlscfg.Options) *configtls.TLSServerSetting {
	return &configtls.TLSServerSetting{
		TLSSetting: configtls.TLSSetting{
			CAFile:     opts.CAPath,
			CertFile:   opts.CertPath,
			KeyFile:    opts.KeyPath,
			MinVersion: opts.MinVersion,
			MaxVersion: opts.MaxVersion,
		},
		ClientCAFile: opts.ClientCAPath,
	}
}

So the certWatcher is not in place for the OTLP GRPC and HTTP interfaces. We tried to implement the certWatcher related code changes in this file. But that did not work as the OTLP GRPC and HTTP servers are implemented in opentelemetry collector where the certificate reload mechanism is missing.

We think that some code changes are required in the below file as well.
https://github.com/open-telemetry/opentelemetry-collector/blob/receiver/otlpreceiver/v0.85.0/receiver/otlpreceiver/otlp.go

So we would like to know if the code changes are required in both Jaeger and Opentelemetry Collector to resolve this issue. Could you please confirm?

@yurishkuro
Copy link
Member

If OTEL receivers do not support reloading certificates dynamically, then we cannot fix it here, because we just import those receivers as is. Please open a ticket there and link here.

@sidharthabiswal
Copy link
Author

Hi @yurishkuro,
We have opened a ticket on Opentelemetry collector and it is linked here.

@yurishkuro
Copy link
Member

looking at otel-collector code, there was a PR open-telemetry/opentelemetry-collector#4737 where TLS reloading was implemented (on a timer, instead of file watcher as we do). The tls.Config object that is returned from the config is provided GetCertificate function, instead of the actual certificates. If that tls.Config is then used with the servers (HTTP or gRPC), I think they should natively use the right certs. Of course, it's always good to have unit tests verifying this behavior.

@sidharthabiswal
Copy link
Author

Hi @yurishkuro,

Thanks for the suggestion.
We have checked the code and it seems the RELOAD_INTERVAL is not used in otlp_receiver.go as it is optional. After hardcoding some value, the certificates are reloading.
But we would like know how the user can set the RELOAD_INTERVAL flag before the deployment.
Is there any plan to make it configurable like helm parameter?

@yurishkuro
Copy link
Member

We would need to add a CLI flag to our *.tls.* flags and pass it to OTEL component. But if we do that, we'd also need to implement timer-based reloading in our code, because tls config is shared across many settings in Jaeger and it would be misleading to support the new flag in only those that are OTEL-related

@sidharthabiswal
Copy link
Author

sidharthabiswal commented Oct 5, 2023

Hi @yurishkuro,

Thanks for your suggestion. We have modified the Jaeger 3PP code(flags.go and options.go) so that ReloadInterval attribute can be used as a CLI flag. Then we used the ReloadInterval in the below function in otlp_receiver.go file.

func applyTLSSettings(opts *tlscfg.Options) *configtls.TLSServerSetting {
	return &configtls.TLSServerSetting{
		TLSSetting: configtls.TLSSetting{
			CAFile:     opts.CAPath,
			CertFile:   opts.CertPath,
			KeyFile:    opts.KeyPath,
			MinVersion: opts.MinVersion,
			MaxVersion: opts.MaxVersion,
			ReloadInterval: opts.ReloadInterval,
		},
		ClientCAFile: opts.ClientCAPath,
	}
}

After the deployment we have observed that the certificate reload is working fine for the OTEL GRPC and HTTP interfaces as per the configured RELOAD_INTERVAL. Since this issue is related to security, it really helps if this fix is delivered by Jaeger soon and then Jaeger may release the complete solution that works for all the interfaces.

Could you please share your opinion?

@yurishkuro
Copy link
Member

Are you planning to open a PR?

@james-ryans
Copy link
Contributor

Hi! I would like to work on this issue.

@yurishkuro how would you prefer this issue to be resolved?

  1. Add a "reload-interval" flag to all TLS config options. With this flag, we can choose to reload the certs either based on the "certWatcher" or reload interval. We can follow the OTel approach that checks if it need to reload certs from incoming requests.
  2. Create a separate TLS config option for OTel, as the current TLS config option is only used as an intermediary to construct OTel TLS settings. Other servers already have certWatcher behind them.

Or maybe you have a better approach in mind?

@yurishkuro
Copy link
Member

yurishkuro commented Oct 26, 2023

I commented on the PR - my suggestion is to add a capability to ServerFlagsConfig that would switch between file watching and reload interval. Then OTLP receiver can enable reload interval in its flags config and show this as a new CLI option, but other places using tlscfg package will continue working with file watching.

Longer term we should converge onto one method, which would be a breaking change, but that would be ok for jaeger-v2 (#4843)

yurishkuro pushed a commit that referenced this issue Oct 27, 2023
## Which problem is this PR solving?
- Resolves #4554 

## Description of the changes
- Enable reload interval functionality to OTel server certificates by
adding `otlp.http.tls.reload-interval` and
`otlp.grpc.tls.reload-interval` flags (0s means disabled)

## How was this change tested?
- Tested manually in local
- Added `*.tls.reload-interval` flag unit tests to `flags.go` and
`otel_receiver.go`

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: James Ryans <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement good first issue Good for beginners help wanted Features that maintainers are willing to accept but do not have cycles to implement
Projects
None yet
8 participants