-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protect internal components using TLS #159
Protect internal components using TLS #159
Conversation
ebae7bb
to
f143b30
Compare
9b5dcb1
to
15ab541
Compare
b89e12b
to
ce8be3a
Compare
case err != nil: | ||
return ctrl.Result{}, err | ||
default: | ||
log.Info("Skipping cert rotation, all Microservices certificates still valid", "name", req.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding until when they will be valid or something?
tests/e2e/reconcile/01-install.yaml
Outdated
@@ -15,6 +15,8 @@ kind: Microservices | |||
metadata: | |||
name: simplest | |||
spec: | |||
images: | |||
tempoQuery: quay.io/rvargasp/tempo-query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this image contain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that image contains the jaegerQuery pluign that supports TLS communication with the tempo front-end. We can remove this when the new version of the oficial image is released. grafana/tempo#1999
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment explaining that? Even, I would create an issue in this repository to do that change in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 we should create an issue to revert it back to the default
04ab12f
to
0f5bc6b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall, I've added a couple of minor comments/suggestions.
@@ -185,7 +185,7 @@ type ObjectStorageSpec struct { | |||
// ObjectStorageTLSSpec is the TLS configuration for reaching the object storage endpoint. | |||
type ObjectStorageTLSSpec struct { | |||
// CA is the name of a ConfigMap containing a CA certificate. | |||
// It needs to be in the same namespace as the LokiStack custom resource. | |||
// It needs to be in the same namespace as the tempo Microservice custom resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😃
// ComponentCertSecretNames returns a list of all tempo component certificate secret names. | ||
func ComponentCertSecretNames(stackName string) []string { | ||
return []string{ | ||
fmt.Sprintf("tempo-%s-distributor-http", stackName), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for creating a different cert for each port of each service?
I was looking at the loki operator, and it creates a new service for each port (https://github.com/grafana/loki/blob/61c02d78f2f54c00c6aa47943c4944dc7b0619a7/operator/internal/manifests/distributor.go#L153-L208), but I don't know why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea is to mantain the HTTP and gRPC TLS configurations separate, as int could be enabled or disables by independent flags. but on the other side, you're right, may be we can use the same certs. I need to think a little bit more on this.
import "fmt" | ||
|
||
func fqdn(serviceName, namespace string) string { | ||
return fmt.Sprintf("%s.%s.svc.cluster.local", serviceName, namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, does this need to be configurable? https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/ provides a --service-dns-domain
option (defaults to cluster.local
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we address this in a separate PR? this is large enough :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some minor comments.
How is handled the use case when a secret with the cert is removed by user? The jaeger-operator is caching the certs on its filesystem and reconstracts the secret.
// CACertRefresh defines the duration of the CA certificate validity until a rotation | ||
// should happen. It can be set up to 80% of CA certificate validity or equal to the | ||
// CA certificate validity. Latter should be used only for rotating only when expired. | ||
CACertRefresh string `json:"caRefresh,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same comment as previous one:
- could a duration be used as a type?
- what is the default?
also the same comment applies for other fields in this file.
tests/e2e/reconcile/01-install.yaml
Outdated
@@ -15,6 +15,8 @@ kind: Microservices | |||
metadata: | |||
name: simplest | |||
spec: | |||
images: | |||
tempoQuery: quay.io/rvargasp/tempo-query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 we should create an issue to revert it back to the default
@@ -15,6 +15,8 @@ kind: Microservices | |||
metadata: | |||
name: simplest | |||
spec: | |||
images: | |||
tempoQuery: quay.io/rvargasp/tempo-query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same here add a comment and link to an issue to revert it back
fc04f26
to
b9488db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since its blocked for a while, can we disable this feature and merge? :b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added one comment regarding the Params
struct.
Looks good overall, some merge conflicts need to be resolved before being able to merge this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need here another follow-up ticket to include the tempo gateway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall LGTM just some minor comments.
Could you please rebase it?
@@ -118,7 +128,7 @@ func (r *MicroservicesReconciler) reconcileManifests(ctx context.Context, log lo | |||
} | |||
} | |||
|
|||
objects, err := manifests.BuildAll(manifestutils.Params{Tempo: tempo, StorageParams: *storageConfig}) | |||
objects, err := manifests.BuildAll(manifestutils.Params{Tempo: tempo, StorageParams: *storageConfig, Gates: r.FeatureGates}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we have a unit test to check if certs are created if TLS is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but due the size of this PR I would prefer to write more tests on follow up PR, if there is no objections I'll disable TLS until we have the tests and the right images in place. Also I would like to integrate tempo gateway to the TLS work.
…rt generation for internal use Signed-off-by: Ruben Vargas <[email protected]>
…rt generation for internal use Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
Co-authored-by: Israel Blancas <[email protected]>
Co-authored-by: Israel Blancas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
b918858
to
fb988d9
Compare
Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
ahh the e2e test failed. i triggered a rerun. ^^ |
Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
c24ecb1
to
75c83b3
Compare
This PR adds the capability to protect the internal components using TLS.
This approach generated the CA, CABundle, and all the certs/private keys for each components using GoLang. It first created all the secrets, and the CABundle
ConfigMap
.Then it configure each component , mounting each secret as a volumes in the different components, also mounting the CA bundle.
There is a cert rotation controller that is in charge of monitoring the certificates expiration, and annotate the secrets and the
Microservice
if a new certificate is required due an expiration.This approach was taken from the Loki operator, almost all the
certrotation
package is copied from there, this package is in charge of create the CA, the bundle and all the certificates and private keys, create theConfigMap
and secret objects with he corresponding annotationsThe buildIn cert manager could be enabled/disabled in ProjectConfig using some FeatureFlags. I added some comments on how that should operate.
In this PR I didn't set the
minTLS
version and the allowed ciphers, I would leave this for another PR as this is large enough.Create a shared library to cert-manager grafana/loki#8181