Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protect internal components using TLS #159

Merged
merged 20 commits into from
Feb 15, 2023

Conversation

rubenvp8510
Copy link
Collaborator

@rubenvp8510 rubenvp8510 commented Jan 15, 2023

This PR adds the capability to protect the internal components using TLS.

This approach generated the CA, CABundle, and all the certs/private keys for each components using GoLang. It first created all the secrets, and the CABundle ConfigMap.

Then it configure each component , mounting each secret as a volumes in the different components, also mounting the CA bundle.

There is a cert rotation controller that is in charge of monitoring the certificates expiration, and annotate the secrets and the Microservice if a new certificate is required due an expiration.

This approach was taken from the Loki operator, almost all the certrotation package is copied from there, this package is in charge of create the CA, the bundle and all the certificates and private keys, create the ConfigMap and secret objects with he corresponding annotations

The buildIn cert manager could be enabled/disabled in ProjectConfig using some FeatureFlags. I added some comments on how that should operate.

In this PR I didn't set the minTLS version and the allowed ciphers, I would leave this for another PR as this is large enough.

Create a shared library to cert-manager grafana/loki#8181

@rubenvp8510 rubenvp8510 force-pushed the protect_intern_service branch 4 times, most recently from ebae7bb to f143b30 Compare January 17, 2023 06:01
@rubenvp8510 rubenvp8510 changed the title Protect intern service Protect internal components using TLS Jan 17, 2023
@rubenvp8510 rubenvp8510 force-pushed the protect_intern_service branch 2 times, most recently from 9b5dcb1 to 15ab541 Compare January 17, 2023 06:10
@rubenvp8510 rubenvp8510 marked this pull request as ready for review January 17, 2023 06:13
@rubenvp8510 rubenvp8510 force-pushed the protect_intern_service branch 11 times, most recently from b89e12b to ce8be3a Compare January 18, 2023 06:02
case err != nil:
return ctrl.Result{}, err
default:
log.Info("Skipping cert rotation, all Microservices certificates still valid", "name", req.String())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding until when they will be valid or something?

@@ -15,6 +15,8 @@ kind: Microservices
metadata:
name: simplest
spec:
images:
tempoQuery: quay.io/rvargasp/tempo-query
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this image contain?

Copy link
Collaborator Author

@rubenvp8510 rubenvp8510 Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that image contains the jaegerQuery pluign that supports TLS communication with the tempo front-end. We can remove this when the new version of the oficial image is released. grafana/tempo#1999

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment explaining that? Even, I would create an issue in this repository to do that change in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 we should create an issue to revert it back to the default

internal/certrotation/var.go Outdated Show resolved Hide resolved
internal/certrotation/rotation_test.go Outdated Show resolved Hide resolved
internal/certrotation/rotation_test.go Show resolved Hide resolved
@rubenvp8510 rubenvp8510 force-pushed the protect_intern_service branch from 04ab12f to 0f5bc6b Compare January 20, 2023 22:33
Copy link
Collaborator

@andreasgerstmayr andreasgerstmayr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall, I've added a couple of minor comments/suggestions.

@@ -185,7 +185,7 @@ type ObjectStorageSpec struct {
// ObjectStorageTLSSpec is the TLS configuration for reaching the object storage endpoint.
type ObjectStorageTLSSpec struct {
// CA is the name of a ConfigMap containing a CA certificate.
// It needs to be in the same namespace as the LokiStack custom resource.
// It needs to be in the same namespace as the tempo Microservice custom resource.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😃

// ComponentCertSecretNames returns a list of all tempo component certificate secret names.
func ComponentCertSecretNames(stackName string) []string {
return []string{
fmt.Sprintf("tempo-%s-distributor-http", stackName),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for creating a different cert for each port of each service?
I was looking at the loki operator, and it creates a new service for each port (https://github.com/grafana/loki/blob/61c02d78f2f54c00c6aa47943c4944dc7b0619a7/operator/internal/manifests/distributor.go#L153-L208), but I don't know why.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea is to mantain the HTTP and gRPC TLS configurations separate, as int could be enabled or disables by independent flags. but on the other side, you're right, may be we can use the same certs. I need to think a little bit more on this.

import "fmt"

func fqdn(serviceName, namespace string) string {
return fmt.Sprintf("%s.%s.svc.cluster.local", serviceName, namespace)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, does this need to be configurable? https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/ provides a --service-dns-domain option (defaults to cluster.local).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we address this in a separate PR? this is large enough :D

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure!

internal/manifests/manifests.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@pavolloffay pavolloffay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some minor comments.

How is handled the use case when a secret with the cert is removed by user? The jaeger-operator is caching the certs on its filesystem and reconstracts the secret.

apis/config/v1alpha1/projectconfig_types.go Outdated Show resolved Hide resolved
apis/config/v1alpha1/projectconfig_types.go Show resolved Hide resolved
// CACertRefresh defines the duration of the CA certificate validity until a rotation
// should happen. It can be set up to 80% of CA certificate validity or equal to the
// CA certificate validity. Latter should be used only for rotating only when expired.
CACertRefresh string `json:"caRefresh,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same comment as previous one:

  • could a duration be used as a type?
  • what is the default?

also the same comment applies for other fields in this file.

internal/manifests/manifestutils/service.go Show resolved Hide resolved
@@ -15,6 +15,8 @@ kind: Microservices
metadata:
name: simplest
spec:
images:
tempoQuery: quay.io/rvargasp/tempo-query
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 we should create an issue to revert it back to the default

@@ -15,6 +15,8 @@ kind: Microservices
metadata:
name: simplest
spec:
images:
tempoQuery: quay.io/rvargasp/tempo-query
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same here add a comment and link to an issue to revert it back

@rubenvp8510 rubenvp8510 force-pushed the protect_intern_service branch 3 times, most recently from fc04f26 to b9488db Compare February 1, 2023 06:40
Copy link
Collaborator

@frzifus frzifus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since its blocked for a while, can we disable this feature and merge? :b

Copy link
Collaborator

@andreasgerstmayr andreasgerstmayr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added one comment regarding the Params struct.
Looks good overall, some merge conflicts need to be resolved before being able to merge this PR.

internal/manifests/manifests.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@frzifus frzifus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need here another follow-up ticket to include the tempo gateway?

Copy link
Collaborator

@pavolloffay pavolloffay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM just some minor comments.

Could you please rebase it?

@@ -118,7 +128,7 @@ func (r *MicroservicesReconciler) reconcileManifests(ctx context.Context, log lo
}
}

objects, err := manifests.BuildAll(manifestutils.Params{Tempo: tempo, StorageParams: *storageConfig})
objects, err := manifests.BuildAll(manifestutils.Params{Tempo: tempo, StorageParams: *storageConfig, Gates: r.FeatureGates})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we have a unit test to check if certs are created if TLS is enabled?

Copy link
Collaborator Author

@rubenvp8510 rubenvp8510 Feb 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but due the size of this PR I would prefer to write more tests on follow up PR, if there is no objections I'll disable TLS until we have the tests and the right images in place. Also I would like to integrate tempo gateway to the TLS work.

apis/config/v1alpha1/projectconfig_types.go Show resolved Hide resolved
apis/config/v1alpha1/projectconfig_types.go Show resolved Hide resolved
rubenvp8510 and others added 14 commits February 13, 2023 23:40
…rt generation for internal use

Signed-off-by: Ruben Vargas <[email protected]>
…rt generation for internal use

Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
@rubenvp8510 rubenvp8510 force-pushed the protect_intern_service branch from b918858 to fb988d9 Compare February 14, 2023 05:42
Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
Signed-off-by: Ruben Vargas <[email protected]>
@frzifus
Copy link
Collaborator

frzifus commented Feb 14, 2023

ahh the e2e test failed. i triggered a rerun. ^^

@rubenvp8510 rubenvp8510 force-pushed the protect_intern_service branch from c24ecb1 to 75c83b3 Compare February 15, 2023 05:55
@rubenvp8510 rubenvp8510 merged commit 9104274 into grafana:main Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants